where is the code for read system call?

2007-07-20 Thread Agarwal, Lomesh
My application reads from socket. I need to change the behavior of read
system call for an experiment. Can someone point me to code?

thanks
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Satyam Sharma

On 7/21/07, Greg KH [EMAIL PROTECTED] wrote:

On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
 On Fri, 20 Jul 2007 15:50:47 -0700
 Greg KH [EMAIL PROTECTED] wrote:

  On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
Hi Greg,
  
This looks like a sysfs bug
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
broken-out-2007-07-20-00-22/3.jpg
  
l *kernel_param_sysfs_setup+0x75
0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
565 mk-mod = THIS_MODULE;
566 kobj_set_kset_s(mk, module_subsys);
567 kobject_set_name(mk-kobj, name);
568 kobject_init(mk-kobj);
569 ret = kobject_add(mk-kobj);
570 BUG_ON(ret  0);
571 param_sysfs_setup(mk, kparam, num_params, name_skip);
572 kobject_uevent(mk-kobj, KOBJ_ADD);
573 }
574
  

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/broken-out-2007-07-20-00-22/mm-config
 
  What kernel version is this happening on?  The -mm tree?  Can you try
  Linus's tree instead?
 
  It looks like there was some needed information right before the first
  stack dump, showing exactly what kobject was trying to be added that was
  already present.  Odds are this is a kernel parameter with the same name
  as a duplicate one within the same module,


I don't think that's an -EEXIST.

I think what we have here is kobject_add() exiting with -EINVAL.
(kobject attempted to be registered with no name!)

[ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189.
That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case,
we would've seen an offset in kobject_shadow_add closer to 0x189,
because the dump_stack() for EEXIST is barely 4 instructions before
we return from that function. ]


  but the trick is going to be
  trying to figure out what module is causing this.


So I'd guess we want to search for a module that's passing a kobject *
to kobject_add() such that !kobj-k_name is true.


  So it's not a sysfs bug, but rather a driver issue that this is
  catching.

 In that case a BUG was way too harsh treatment, and in fact directly
 contributed to our inability to debug the bug!

 Can we wind that back a bit?  Add some useful printks and then recover
 in some fashion?
[...]
So I'm guessing he was trying to catch something specific here.


Considering that:

(1) This isn't a bug that should bring down the kernel that hard, and,
(2) kobject_shadow_add() seems to be dumping enough stacks and
printing printk's on errors already,

I'd suggest to just get rid of the BUG_ON() in kernel_param_sysfs_setup()


Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Create clflush() inline, remove hardcoded wbinvd

2007-07-20 Thread H. Peter Anvin
Glauber de Oliveira Costa wrote:
 On Fri, 2007-07-20 at 14:19 -0700, H. Peter Anvin wrote:
 Create an inline function for clflush(), with the proper arguments,
 and use it instead of hard-coding the instruction.

 This also removes one instance of hard-coded wbinvd, based on a patch
 by Bauder de Oliveira Costa.
 Hey, Who's that guy that got a name so close to mine? ;-)

That would be Mr. Typo!

 Cc: Andi Kleen [EMAIL PROTECTED]
 Cc: Glauber de Oliveira Costa [EMAIL PROTECTED]

I got it right here at least :-/

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] compat_ioctl requires CONFIG_BLOCK

2007-07-20 Thread Arnd Bergmann
On Saturday 21 July 2007, Sebastian Siewior wrote:
 
 Got with randconfig
 include/linux/loop.h:66: error: expected specifier-qualifier-list before
 'request_queue_t'
 make[1]: *** [fs/compat_ioctl.o] Error 1
 
 parts of compat ioctl require CONFIG_BLOCK to be set.
 
 Signed-off-by: Sebastian Siewior [EMAIL PROTECTED]
 Index: b/fs/compat_ioctl.c
 ===
 --- a/fs/compat_ioctl.c
 +++ b/fs/compat_ioctl.c
 @@ -63,7 +63,9 @@
  #include linux/wireless.h
  #include linux/atalk.h
  #include linux/blktrace_api.h
 +#ifdef CONFIG_BLOCK
  #include linux/loop.h
 +#endif

Adding #ifdef around an #include is considered bad style. Better just
make loop.h compile without any conditionals. Does the below
patch work for you?

Arnd 

--- a/include/linux/loop.h
+++ b/include/linux/loop.h
@@ -63,7 +63,7 @@ struct loop_device {
struct task_struct  *lo_thread;
wait_queue_head_t   lo_event;
 
-   request_queue_t *lo_queue;
+   struct request_queue*lo_queue;
struct gendisk  *lo_disk;
struct list_headlo_list;
 };
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] pcmcia: CompactFlash driver for PA Semi Electra boards

2007-07-20 Thread Andrew Morton
On Thu, 5 Jul 2007 09:49:14 -0500
[EMAIL PROTECTED] (Olof Johansson) wrote:

 Driver for the CompactFlash slot on the PA Semi Electra eval board. It's
 a simple device sitting on localbus, with interrupts and detect/voltage
 control over GPIO.
 
 The driver is implemented as an of_platform driver, and adds localbus
 as a bus being probed by the of_platform framework.
 
 
 Signed-off-by: Olof Johansson [EMAIL PROTECTED]
 
 ---
 
 On Mon, Jun 25, 2007 at 03:43:41PM -0500, olof wrote:
 
  The ifdef is needed since for CONFIG_PCMCIA=n builds, the bus notifier
  isn't available. I wanted to do the bus notifier registration explicitly
  before the of_platform bus probe to avoid later surprises due to reordered
  initcalls in case it was split up in it's own initcall.
  
  I could add the code under ifdef as well, but it didn't seem too
  critical. Once the second major board comes along I'll probably move it
  out to a per-board file, there's no real need for it just yet.
 
 Alright, turns out I still need to declare the extern bus type, which would 
 mean
 two #ifdefs in one function. Moving it out instead.
 
 I've addressed Milton's comments as well.
 
 Who's maintaining PCMCIA? MAINTAINERS only lists a mailing list, no person. 
 Seems
 weird for a component that's marked as maintained.

Dominik Brodowski.  He's having a bit of downtime at present (exams, I
think).  He expects to return.  Meanwhile, cc'ing me usually has some
effect.


 ...

 +static const char driver_name[] = electra-cf;

 ...

 +static struct of_device_id electra_cf_match[] =
 +{
 + {
 + .compatible   = electra-cf,
 + },
 + {},
 +};

Could have reused driver_name[] here, if that was appropriate.

 +static struct of_platform_driver electra_cf_driver =
 +{
 + .name  = (char *)driver_name,

ug.  But it's not your fault - we should have always made it const.

 --- mainline.orig/arch/powerpc/platforms/pasemi/setup.c
 +++ mainline/arch/powerpc/platforms/pasemi/setup.c

I never know who maintains random-scruffy-ppc code like this.  From a peek
in the git-whatchanged output, it appears to be yourself.


Have a few little fixies:

--- 
a/drivers/pcmcia/electra_cf.c~pcmcia-compactflash-driver-for-pa-semi-electra-boards-fix
+++ a/drivers/pcmcia/electra_cf.c
@@ -201,9 +201,7 @@ static int __devinit electra_cf_probe(st
if (!cf)
return -ENOMEM;
 
-   init_timer(cf-timer);
-   cf-timer.function = electra_cf_timer;
-   cf-timer.data = (unsigned long) cf;
+   setup_timer(cf-timer, electra_cf_timer, (unsigned long)cf);
cf-irq = NO_IRQ;
 
cf-ofdev = ofdev;
@@ -340,16 +338,14 @@ static int __devexit electra_cf_remove(s
return 0;
 }
 
-static struct of_device_id electra_cf_match[] =
-{
+static struct of_device_id electra_cf_match[] = {
{
.compatible   = electra-cf,
},
{},
 };
 
-static struct of_platform_driver electra_cf_driver =
-{
+static struct of_platform_driver electra_cf_driver = {
.name  = (char *)driver_name,
.match_table= electra_cf_match,
.probe= electra_cf_probe,
@@ -371,4 +367,3 @@ module_exit(electra_cf_exit);
 MODULE_LICENSE(GPL);
 MODULE_AUTHOR (Olof Johansson [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(PA Semi Electra CF driver);
-
_

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Alan Cox
On Fri, 20 Jul 2007 18:38:39 -0400
Jeff Garzik [EMAIL PROTECTED] wrote:

 I agree with Andi...  it's quite nice to be able to leave some arch/i386 
 stuff, and not carry it over to arch/x86-64.

Its easy enough to push that stuff into arch/x86/legacy and have one
subdirectory of stuff to pull in for ancient systems.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Michal Piotrowski

Hi,

On 21/07/07, Thomas Gleixner [EMAIL PROTECTED] wrote:

We are pleased to announce a project we've been working on for some
time: the unified x86 architecture tree, or arch/x86 - and we'd like
to solicit feedback about it.

What is this about?

[..]

As usual, comments and suggestions are welcome!


I really like this idea - code duplication is a bad thing.

BTW. I don't see any regression here :)



Thomas, Ingo


Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Ingo Molnar

* Michal Piotrowski [EMAIL PROTECTED] wrote:

 We are pleased to announce a project we've been working on for some 
 time: the unified x86 architecture tree, or arch/x86 - and we'd 
 like to solicit feedback about it.
 
 What is this about?
 [..]
 As usual, comments and suggestions are welcome!
 
 I really like this idea - code duplication is a bad thing.
 
 BTW. I don't see any regression here :)

cool - could you tell us a bit more about on what type of box you tried 
it, and how wide and versatile the .config is?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Satyam Sharma

Oh, which means ...


On 7/21/07, Satyam Sharma [EMAIL PROTECTED] wrote:

On 7/21/07, Greg KH [EMAIL PROTECTED] wrote:
 On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
  On Fri, 20 Jul 2007 15:50:47 -0700
  Greg KH [EMAIL PROTECTED] wrote:
 
   On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
 Hi Greg,
   
 This looks like a sysfs bug
 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
 broken-out-2007-07-20-00-22/3.jpg
   
 l *kernel_param_sysfs_setup+0x75
 0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
 565 mk-mod = THIS_MODULE;
 566 kobj_set_kset_s(mk, module_subsys);



 567 kobject_set_name(mk-kobj, name);


Shouldn't the return of kobject_set_name() be checked here?

[ Looking at code, and realizing that kobject_set_name() manages to
succeed even when given a null string! ]


 568 kobject_init(mk-kobj);
 569 ret = kobject_add(mk-kobj);
 570 BUG_ON(ret  0);
 571 param_sysfs_setup(mk, kparam, num_params, name_skip);
 572 kobject_uevent(mk-kobj, KOBJ_ADD);
 573 }
 574
   
 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
 broken-out-2007-07-20-00-22/mm-config
  
   What kernel version is this happening on?  The -mm tree?  Can you try
   Linus's tree instead?
  
   It looks like there was some needed information right before the first
   stack dump, showing exactly what kobject was trying to be added that was
   already present.  Odds are this is a kernel parameter with the same name
   as a duplicate one within the same module,

I don't think that's an -EEXIST.

I think what we have here is kobject_add() exiting with -EINVAL.
(kobject attempted to be registered with no name!)

[ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189.
That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case,
we would've seen an offset in kobject_shadow_add closer to 0x189,
because the dump_stack() for EEXIST is barely 4 instructions before
we return from that function. ]

   but the trick is going to be
   trying to figure out what module is causing this.

So I'd guess we want to search for a module that's passing a kobject *
to kobject_add() such that !kobj-k_name is true.


Oh, that's kernel_param_sysfs_setup itself. So we actually need to
search for a built-in module in Michal's config that ... has an ... empty
 modname !? Shouldn't that turn up pretty quickly in a grep?

How do I do that, btw?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread H. Peter Anvin
Alan Cox wrote:
 On Fri, 20 Jul 2007 18:38:39 -0400
 Jeff Garzik [EMAIL PROTECTED] wrote:
 
 I agree with Andi...  it's quite nice to be able to leave some arch/i386 
 stuff, and not carry it over to arch/x86-64.
 
 Its easy enough to push that stuff into arch/x86/legacy and have one
 subdirectory of stuff to pull in for ancient systems.

The other thing is that legacy in this context is fungible.  No IOMMU
was legacy until the Intel x86-64 chips came out, and I can promise you
that some legacy code will be necessary once we start seeing VIA and
others come out with embedded x86-64.

On the other hand, it's pretty bloody safe to assume that we'll never
see an x86-64 chip without CPUID, CMOV, FXSAVE, SSE-2, CMPXCHG, etc.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: joydev.c and saitek cyborg evo force

2007-07-20 Thread Renato Golin

On 20/06/07, Jiri Kosina [EMAIL PROTECTED] wrote:

Could you please send me the report descriptor of the device, so that I
could debug it locally here?


Hi Jiri,

sorry for the delay, below the report descriptor and attached is the
full report when I've connected the joystick.


report descriptor (size 851, read 851) =  05 01 09 04 a1 01 09 01 a1
00 85 06 09 30 15 00 26 00 10 35 00 46 00 10 75 10 95 01 81 02 09 31
81 02 05 02 09 bb 26 ff 00 46 ff 00 75 08 81 02 05 09 19 01 29 0c 25
01 45 01 75 01 95 0c 81 02 05 01 09 39 25 07 46 3b 01 55 00 65 44 75
04 95 01 81 42 65 00 05 02 09 ba 26 ff 00 46 ff 00 75 08 81 02 c0 05
0f 09 92 a1 02 85 02 09 a6 09 a4 09 a0 09 9f 25 01 45 00 75 01 95 04
81 02 75 04 95 01 81 03 09 22 75 07 25 09 81 02 09 94 75 01 25 01 81
02 75 08 81 03 c0 09 21 a1 02 85 0b 09 22 25 09 91 02 09 25 a1 02 09
26 09 30 09 32 09 31 09 33 09 34 09 40 09 41 15 01 25 08 91 00 c0 09
53 25 0c 75 05 91 02 09 56 15 00 25 01 75 01 91 02 09 55 a1 02 05 01
09 30 09 31 95 02 91 02 c0 05 0f 09 50 27 fe ff 00 00 47 fe ff 00 00
75 10 95 01 55 fd 66 01 10 91 02 55 00 65 00 09 57 26 ff 00 46 68 01
75 08 65 44 91 02 65 00 09 54 27 fe ff 00 00 47 fe ff 00 00 75 10 55
fd 66 01 10 91 02 55 00 65 00 09 58 a1 02 05 0a 09 01 09 02 26 2b 01
45 00 95 02 91 02 c0 05 0f 09 a7 27 fe ff 00 00 47 fe ff 00 00 95 01
55 fd 66 01 10 91 02 55 00 65 00 c0 09 5a a1 02 85 0c 09 23 26 2b 01
45 00 91 02 09 5c 26 10 27 46 10 27 55 fd 66 01 10 91 02 55 00 65 00
09 5b 25 7f 75 08 91 02 09 5e 26 10 27 75 10 55 fd 66 01 10 91 02 55
00 65 00 09 5d 25 7f 75 08 91 02 c0 09 73 a1 02 85 0d 09 23 26 2b 01
45 00 75 10 91 02 09 70 15 81 25 7f 36 f0 d8 46 10 27 75 08 91 02 c0
09 6e a1 02 85 0e 09 23 15 00 26 2b 01 35 00 45 00 75 10 91 02 09 70
25 7f 46 10 27 75 08 91 02 09 6f 15 81 36 f0 d8 91 02 09 71 15 00 26
ff 00 35 00 46 68 01 91 02 09 72 26 10 27 46 10 27 75 10 55 fd 66 01
10 91 02 55 00 65 00 c0 09 5f a1 02 85 0f 09 23 26 2b 01 45 00 91 02
09 61 15 9c 25 64 36 f0 d8 46 10 27 75 08 91 02 09 62 91 02 09 60 16
0c fe 26 f4 01 75 10 91 02 09 65 15 00 26 e8 03 35 00 91 02 09 63 25
64 75 08 91 02 09 64 91 02 c0 09 77 a1 02 85 51 09 22 25 09 45 00 91
02 09 78 a1 02 09 7b 09 79 09 7a 15 01 25 03 91 00 c0 09 7c 15 00 26
fe 00 91 02 c0 09 92 a1 02 85 52 09 96 a1 02 09 9a 09 99 09 97 09 98
09 9b 09 9c 15 01 25 06 91 00 c0 c0 05 ff 0a 01 03 a1 02 85 40 0a 02
03 a1 02 1a 11 03 2a 20 03 25 10 91 00 c0 0a 03 03 15 00 27 ff ff 00
00 75 10 91 02 c0 05 0f 09 7d a1 02 85 43 09 7e 26 80 00 46 10 27 75
08 91 02 c0 09 85 a1 02 85 44 09 86 27 ff ff 00 00 45 00 75 10 91 02
09 87 91 02 09 88 91 02 c0 05 ff 0a 00 01 a1 02 85 81 05 01 09 30 15
81 25 7f 36 f0 d8 46 10 27 75 08 91 02 09 31 91 02 c0 05 0f 09 7f a1
02 85 0b 09 80 15 00 26 ff 7f 35 00 45 00 75 0f b1 03 09 a9 25 01 75
01 b1 03 09 83 26 ff 00 75 08 b1 03 09 84 25 10 b1 03 09 a8 a1 02 09
73 09 6e 09 5a 09 5f 95 04 b1 03 c0 c0 c0

cheers,
--renato

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm


joy-dmesg.log.gz
Description: GNU Zip compressed data


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Michal Piotrowski

On 21/07/07, Ingo Molnar [EMAIL PROTECTED] wrote:


* Michal Piotrowski [EMAIL PROTECTED] wrote:

 We are pleased to announce a project we've been working on for some
 time: the unified x86 architecture tree, or arch/x86 - and we'd
 like to solicit feedback about it.
 
 What is this about?
 [..]
 As usual, comments and suggestions are welcome!

 I really like this idea - code duplication is a bad thing.

 BTW. I don't see any regression here :)

cool - could you tell us a bit more about on what type of box you tried
it,


it is an old P4 (i386)


and how wide and versatile the .config is?


http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-git15/config



Ingo



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Rob Landley
On Thursday 19 July 2007 4:16:17 am Cornelia Huck wrote:
 On Wed, 18 Jul 2007 13:39:53 -0400,

 Rob Landley [EMAIL PROTECTED] wrote:
  Nope.  If you recurse down under /sys/class following symlinks, you go
  into an endless loop bouncing off of /sys/devices and getting pointed
  back.  If you don't follow symlinks, it works fine up until about 2.6.20
  at which point things that were previously directories BECAME symlinks
  because the directories got moved, and it all broke.

 I have no idea what you're doing.

See the email to kay sievers.  In 2.6.14 following symlinks hit an endless
/sys/block/hda/device/block/device/block/device/block...  This has changed 
since, like much of sysfs, but in the absence of either a spec or a stable 
API there's no guarantee it won't reoccur.

  Which is why I want it documented where to look for these suckers.  Just
  give me ONE STABLE WAY TO FIND THIS INFORMATION, PLEASE.

 See Documentation/sysfs-rules.txt.

Ok:

Paragraph 1: It's not stable.
Paragraph 2: It's not stable.
Paragraph 3: If you really really need to access it directly...
Paragraph 4: DO NOT DO $XXX.
Paragraph 5: Expect it to be mounted at /sys
Paragraph 6: DO NOT DO $XXX.  (Specficially, the way you were distinguishing 
between block and char devices?  Don't do that.  No, we won't tell you what 
to replace it with, keep reading.)

So far, not exactly gripping reading.

Paragraph 7: What a devpath is.  Ok, is it just me or does it say that 
applications shouldn't use the symlinks in sysfs?  Why are they there, then?

Paragraph 8: The kernel has a name for the device.
Paragraph 9: Subsystem is a string.  What it means, we leave for you to guess.
Paragraph 10: Driver is the name of a driver.  (Does this mean a driver is 
currently loaded and handling the device, or that the kernel is suggesting a 
driver based on something like PCI ID, through the kind of mechanism that 
used to be used to request module loading?  Experimentally, it looks like the 
first, which makes sense but isn't specified.  Does something 
like /sys/class/mem/zero or have a driver?  Experimentally, no, it hasn't got 
a device link.)
Paragraph 11: Atributes, and yet more DO NOT DO $XXX.  It took me three reads 
of that to figure out they probably meant Attributes belong to a device, 
don't confuse the attributes of another device with attributes of this 
device.  (Following _which_ device symlink?)

Ok, back up.  /sys/devices does not contain all the information necessary to 
populate /dev, because it hasn't got things like 
ramdisks, /dev/zero, /dev/console which are THERE in sysfs, which may or may 
not be supported by the kernel (the kernel might have ramdisk support, might 
not).  These things could also, in future, have their major and minor numbers 
dynamically (even randomly) assigned.  That's been discussed on this list.

I'm not trying to document /sys/devices.  I'm trying to document hotplug, 
populating /dev, and things like firmware loading that fall out of that.  
This requires use of sysfs, and I'm only trying to document as much of sysfs 
as you need to do that.  I'm not documenting stuff 
like /sys/devices/system/cpu.

The consensus so far is the udev implementation is the spec, except I 
watched the udev implementation change rather a lot before I stopped tracking 
it, and saw a number of people complain on this list about things breaking 
when they upgraded the kernel but not udev.

Back to reading the document:
 - Properties of parent devices never belong into a child device.

Belong into?

  Always look at the parent devices themselves for determining device
  context properties.

For determining?

What was the original language of this document?

 If the device 'eth0' or 'sda' does not have a
   driver-link, then this device does not have a driver.

Again, whether they mean the kernel was not built with a driver that can 
handle this device or no driver is currently loaded and handling this 
device.  It _sounds_ like this device is not supported by Linux, which 
probably isn't what they meant.

 Never copy any property of the parent-device into a child-device.

I note that the only mention made so far of parent-child relationships in 
devices is in terms of don'ts.  I assume they're talking about how a 
partition can be the child of a block device, and a network controller card 
can be the child of a pci bus device?

Ah, I see.  The next paragraph is on hierarchy, yet doesn't actually explain 
anything, other than to imply that the device hierarchy being fully 
represented there is a dream to be achieved sometime in the future but not 
necessarily the truth with today's kernels, because stuff is still being 
_moved_ into /sys/devices.

 - Classification by subsystem
  There are currently three places for classification of devices:
  /sys/block, /sys/class and /sys/bus.

So if somebody wants to write code that runs on a current kernel, they have no 
alternative but to look in these three places.  If future kernels 

Re: v2.6.22.1-rt3

2007-07-20 Thread Thomas Gleixner
On Thu, 2007-07-19 at 20:37 -0700, Daniel Walker wrote:
 The broken out series is here,
 ftp://source.mvista.com/pub/dwalker/rt/patch-2.6.22.1-rt4-dw1.tar.gz

I'll pick that up soon.

Thanks,

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Michal Piotrowski

On 21/07/07, Satyam Sharma [EMAIL PROTECTED] wrote:

Oh, which means ...


On 7/21/07, Satyam Sharma [EMAIL PROTECTED] wrote:
 On 7/21/07, Greg KH [EMAIL PROTECTED] wrote:
  On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
   On Fri, 20 Jul 2007 15:50:47 -0700
   Greg KH [EMAIL PROTECTED] wrote:
  
On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
  Hi Greg,

  This looks like a sysfs bug
  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
  broken-out-2007-07-20-00-22/3.jpg

  l *kernel_param_sysfs_setup+0x75
  0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
  565 mk-mod = THIS_MODULE;
  566 kobj_set_kset_s(mk, module_subsys);

  567 kobject_set_name(mk-kobj, name);

Shouldn't the return of kobject_set_name() be checked here?

[ Looking at code, and realizing that kobject_set_name() manages to
succeed even when given a null string! ]

  568 kobject_init(mk-kobj);
  569 ret = kobject_add(mk-kobj);
  570 BUG_ON(ret  0);
  571 param_sysfs_setup(mk, kparam, num_params, name_skip);
  572 kobject_uevent(mk-kobj, KOBJ_ADD);
  573 }
  574

  http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
  broken-out-2007-07-20-00-22/mm-config
   
What kernel version is this happening on?  The -mm tree?  Can you try
Linus's tree instead?
   
It looks like there was some needed information right before the first
stack dump, showing exactly what kobject was trying to be added that was
already present.  Odds are this is a kernel parameter with the same name
as a duplicate one within the same module,

 I don't think that's an -EEXIST.

 I think what we have here is kobject_add() exiting with -EINVAL.
 (kobject attempted to be registered with no name!)

 [ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189.
 That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case,
 we would've seen an offset in kobject_shadow_add closer to 0x189,
 because the dump_stack() for EEXIST is barely 4 instructions before
 we return from that function. ]

but the trick is going to be
trying to figure out what module is causing this.

 So I'd guess we want to search for a module that's passing a kobject *
 to kobject_add() such that !kobj-k_name is true.

Oh, that's kernel_param_sysfs_setup itself. So we actually need to
search for a built-in module in Michal's config that ... has an ... empty
 modname !?


I'll try to figure out this


Shouldn't that turn up pretty quickly in a grep?

How do I do that, btw?



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Arnd Bergmann
On Saturday 21 July 2007, Thomas Gleixner wrote:
 The topic of sharing more x86 code has been discussed on LKML a number 
 of times. Various approaches were discussed and we decided to advance 
 the discussion by implementing a full solution that brings the 
 transition to a shared tree to completion.

Great stuff. I've worked on doing the same for s390 and powerpc
in the past, and really think it's the right thing to do. I've
even started my own x86 merge two or three times in the past
but never got very far because of the quickly moving source.

 In this initial implementation the old arch/i386 and arch/x86_64 trees 
 are removed _immediately_, in the same commit, and all future x86 
 development goes on in the new, shared tree. So the transition right now 
 is one atomic operation.
 
 As a next step we plan to generate a gradual, fully bisectable, fully
 working switchover from the current code to the fully populated
 arch/x86 tree. It will result in about 1000-2000 commits. We are
 releasing our current solution because it 100% represents the finally
 resulting arch/x86 source tree already, and we first wanted to make
 sure that the new architecture layout works fine and folks are happy
 before we go and do the (even more complex) fine-grained work.

I don't think it's really good to do it this way, or maybe I'm still
misunderstanding where you're going. If you really want to end
up with the exact set of files that you have your tree now, I see
absolutely zero point in making it bisectable. On the contrary,
there is nothing particularly complicated in it, so once it has
seen some amount of testing it can better get merged in one
big changeset. I'm just not convinced that it actully is what
we want to end up with.

In my experience, it's very helpful to have a single set of header
files, and merging the two versions of one header usually exposes
bugs that have been fixed in only one of the two, so you get
to fix actual bugs in the process.

In the s390 merge, I also started out in an attempt to guarantee
unchanged object files, much like what you describe. However, it
turned out that fixing it in the process is actually easier.
Either way, 'diff -D __x86_64__' is a great tool for a start, you
should try it out to see how easy it is to merge a lot of files.

To put it into perspective, I think the s390 merge was a lot easier
than the x86 merge, because there is only a very limited set of
hardware configurations for s390 compared to others. We ended up
doing the full merge with three people within less than a week
and no separate files at all.

OTOH, the powerpc merge is now going into its third year, mostly
because it was started with the intention to remove all cruft
in the process and to only allow sane code into the new architecture.

The steps that I'd suggest instead are:

* merge all exported header files of the two architectures. This
  alone is a worthy goal, because it allows us to get rid of
  the ugly code for deciding which version to use in installed
  headers and elsewhere.

* Merge the remaining header files, to end up with a single
  include/asm-x86 directory.

* Come up with a model that integrates the machine type selection
  of i386 with the way we build things on x86_64. One way would
  be to make X86_64 another platform next to X86_PC, X86_VOYAGER
  and the others.

* Create an arch/x86/Kconfig that handles the new common
  configuration

* Create an arch/x86/Makefile that descends into ../i386/* and
  ../x86_64/* instead of its subdirectories.

* Merge the arch/x86/* subdirectories, one at a time, starting with
  the low-hanging fruit like oprofile or pci, and do the hard
  ones like mm and kernel last.

Unfortunately, I don't think I'll spend much time on this, so I
don't get to decide on it, but you asked for feedback ;-)

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] two warning fixes

2007-07-20 Thread Benjamin Herrenschmidt
On Fri, 2007-07-20 at 20:34 +0200, Krzysztof Halasa wrote:
 Linus Torvalds [EMAIL PROTECTED] writes:
 
  More people *should* generally ask themselves: was the warning worth it? 
  and then, if the answer is no, they shouldn't add code, they should 
  remove the thing that causes the warning in the first place.
 
 Sure. If a routine uses must_check yet its return value may be
 safely ignored then that must_check is simply misplaced and should
 be removed. It does not mean all must_checks are bad - each of them
 isn't bad unless one can demonstrate it is.
 
 Back to sysfs_create_bin_file() - if one can demonstrate a caller
 can safely ignore the return value (which, it seems, is the
 case), then exactly this very must_check should be removed

Typically, the EDID creation in radeonfb :-)

In fact, I'm not even sure there's -any- user of those sysfs files. I
added them back then to allow distros to extract the EDID infos that
were probed by radeonfb to properly configure the X server (because on
some machines, the EDID is coming from the firmware/BIOS, not from DDC,
and X can't get at it). I don't know if they ever used them.

In any case, it doesn't make sense to abort initialization of the driver
if for some reasons those files can't be created (for example, the core
fbdev starts exposing EDID files, radeonfb isn't properly updated, name
clash, error). Aborting the initialization will make sure that on some
machines such as powermacs with radeon, whatever error is displayed will
never be seen by the user.

That's a typical, but I have plenty more.

For example, the powermac thermal control drivers. They work pretty well
by themselves. They also expose via sysfs all the current values, fan
speeds, temps ,etc... for the sake of whoever wants to do a GUI or
monitor what's going on, but that is not critical to the operation of
the driver. Thus, failure to create those files is not critical.

I have plenty other examples.

Thus, we have two choices here:

 - The simple one: sysfs_create_blah() displays a warning when it fails
and has no must_check

 - The one that adds code everywhere (the current one):
sysfs_create_blah() returns an error, has much_check, and thus all
callers like I described abvoe need to add code to test it and print a
warning. Lots of added .text and .data for little benefit.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] MMC updates

2007-07-20 Thread Pierre Ossman
Linus, please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc.git for-linus

to receive the following updates:

 MAINTAINERS |7 ++-
 drivers/mmc/host/at91_mci.c |   13 -
 drivers/mmc/host/sdhci.c|2 ++
 drivers/mmc/host/sdhci.h|1 +
 4 files changed, 21 insertions(+), 2 deletions(-)

Marc Pignat (1):
  mmc: at91_mci: wakeup on card insertion (or removal)

Pierre Ossman (2):
  mmc: add maintainer for at91
  sdhci: make sure to clear the error interrupt

diff --git a/MAINTAINERS b/MAINTAINERS
index fbe0dca..c9fab2b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -645,7 +645,12 @@ W: http://linux-atm.sourceforge.net
 S: Maintained
 
 ATMEL AT91 MCI DRIVER
-S: Orphan
+P: Nicolas Ferre
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED] (subscribers-only)
+W: http://www.atmel.com/products/AT91/
+W: http://www.at91.com/
+S: Maintained
 
 ATMEL MACB ETHERNET DRIVER
 P: Haavard Skinnemoen
diff --git a/drivers/mmc/host/at91_mci.c b/drivers/mmc/host/at91_mci.c
index 28c8818..15aab37 100644
--- a/drivers/mmc/host/at91_mci.c
+++ b/drivers/mmc/host/at91_mci.c
@@ -903,8 +903,10 @@ static int __init at91_mci_probe(struct platform_device 
*pdev)
/*
 * Add host to MMC layer
 */
-   if (host-board-det_pin)
+   if (host-board-det_pin) {
host-present = !at91_get_gpio_value(host-board-det_pin);
+   device_init_wakeup(pdev-dev, 1);
+   }
else
host-present = -1;
 
@@ -940,6 +942,7 @@ static int __exit at91_mci_remove(struct platform_device 
*pdev)
host = mmc_priv(mmc);
 
if (host-present != -1) {
+   device_init_wakeup(pdev-dev, 0);
free_irq(host-board-det_pin, host);
cancel_delayed_work(host-mmc-detect);
}
@@ -966,8 +969,12 @@ static int __exit at91_mci_remove(struct platform_device 
*pdev)
 static int at91_mci_suspend(struct platform_device *pdev, pm_message_t state)
 {
struct mmc_host *mmc = platform_get_drvdata(pdev);
+   struct at91mci_host *host = mmc_priv(mmc);
int ret = 0;
 
+   if (device_may_wakeup(pdev-dev))
+   enable_irq_wake(host-board-det_pin);
+
if (mmc)
ret = mmc_suspend_host(mmc, state);
 
@@ -977,8 +984,12 @@ static int at91_mci_suspend(struct platform_device *pdev, 
pm_message_t state)
 static int at91_mci_resume(struct platform_device *pdev)
 {
struct mmc_host *mmc = platform_get_drvdata(pdev);
+   struct at91mci_host *host = mmc_priv(mmc);
int ret = 0;
 
+   if (device_may_wakeup(pdev-dev))
+   disable_irq_wake(host-board-det_pin);
+
if (mmc)
ret = mmc_resume_host(mmc);
 
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 10d15c3..4a24db0 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1024,6 +1024,8 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
 
intmask = ~(SDHCI_INT_CMD_MASK | SDHCI_INT_DATA_MASK);
 
+   intmask = ~SDHCI_INT_ERROR;
+
if (intmask  SDHCI_INT_BUS_POWER) {
printk(KERN_ERR %s: Card is consuming too much power!\n,
mmc_hostname(host-mmc));
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 7400f4b..a6c8704 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -107,6 +107,7 @@
 #define  SDHCI_INT_CARD_INSERT 0x0040
 #define  SDHCI_INT_CARD_REMOVE 0x0080
 #define  SDHCI_INT_CARD_INT0x0100
+#define  SDHCI_INT_ERROR   0x8000
 #define  SDHCI_INT_TIMEOUT 0x0001
 #define  SDHCI_INT_CRC 0x0002
 #define  SDHCI_INT_END_BIT 0x0004


-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Rob Landley
On Wednesday 18 July 2007 7:40:20 pm Greg KH wrote:
 On Wed, Jul 18, 2007 at 01:39:53PM -0400, Rob Landley wrote:
  PICK ONE!  JUST #*%(#% PICK ONE!  HHH!
 
  I don't care where it is.  Just put it somewhere I can find it, and keep
  it there.  All this gratuitous moving stuff around serves NO PURPOSE
  other than to break userspace.  I'm trying to document this so that the
  next time you go oh wait, it should be at /sys/tarantula/fruitbat I
  can show that you're breaking an existing documented userspace API.
 
  There's a kernel config option to make symlinks from the old
  location.  /sys/block makes as much sense as any other location, and it's
  what's there now.

 Read the sysfs documentation file we just added, it describes how this
 is all documented and should be used.  So well that I do not think you
 need to try to document it again.

I'm not trying to document all of sysfs, I'm trying to document hotplug.  I 
realize now I should have been more clear about that.

I've been working on the document I just posted on and off since may,  
(Possibly longer but I lost a lot of data in the hard drive crash on my 
laptop last month.  For example, I can't find a copy of my 
half-finished history of hotplug document and will probably need to start 
over, although I've still got a few places to look to see if I backed up a 
copy...)

This document has been sitting mostly unchanged on my hard drive since OLS, 
until I finally tracked down example code to do the netlink bit so I could 
finish it.  I tried to bounce a copy of the everything but netlink version 
off of kay by replying to his email with notes from OLS, and that's when I 
bumped into the he's spam-blocking me issue.  It got lost in the shuffle of 
OLS, and I just got back to it at the start of this thread.

Earlier today I read (and commented on, in the message to Cornelia Huck) the 
copy of Documentation/sysfs-rules.txt.  (Ah, darn it.  I have too many open 
windows on my desktop.  Hits send on message to Cornelia huck I _wrote_ 
earlier today.)

Documentation/sysfs-rules.txt doesn't talk about /sbin/hotplug or netlink 
hotplug.  It doesn't say how to distinguish a char device from a block 
device.  It mostly talks about finding stuff under the /sys/devices 
directory, most of which isn't relevant to populating /dev.  It doesn't 
clearly distinguish where you can find information in current kernels (2.6.22 
and earlier) from stuff that hasn't gone into any existing release.  Ideally 
I'd like to identify a subset of that information which is not only present 
in current kernels but should remain findable at that location in future 
kernels.  Over half the document is about what _not_ to do, and consists of 
warnings about buggy apps, despite the assumption that anything _not_ 
explicitly documented is forbidden because most of the things sysfs exports 
are considered unmaintainable.

I've read the stuff under Documentation/ABI/{stable,testing}, and would be 
happy to refer to it rather than duplicating if I could get the info I needed 
out of it.  Documentation/filesystems/sysfs.txt is still from Patrick Mochel 
in 2003 and mostly about the kernel side rather than an API exported to 
userspace, and sysfs-pci.txt in that directory is similar.  Is there more I 
missed?

 thanks,

 greg k-h

Sorry, I'm not trying to be a pain.  I'm trying to document something I had to 
figure out for myself experimentally in 2005, which has been broken for me by 
kernel changes twice since then (when the device symlink went in back 
around 2.6.14, and when subdirs turned to symlinks recently), and I'm told is 
changing again with the additon of /sys/class/block (which means /sys/class/* 
no longer contains just char devices).

Ideally I'd like to come up with documentation that allows somebody to write 
one program that works on existing AND on new kernels, hence stable API.

Rob
-- 
One of my most productive days was throwing away 1000 lines of code.
  - Ken Thompson.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Greg KH
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote:
 Ok, back up.  /sys/devices does not contain all the information necessary to 
 populate /dev, because it hasn't got things like 
 ramdisks, /dev/zero, /dev/console which are THERE in sysfs, which may or may 
 not be supported by the kernel (the kernel might have ramdisk support, might 
 not).

Welcome to 2007:

$ ls /sys/devices/virtual/mem/
full  kmem  kmsg  mem  null  port  random  urandom  zero
$ ls /sys/devices/virtual/tty/
console  tty12  tty19  tty25  tty31  tty38  tty44  tty50  tty57  tty63
ptmx tty13  tty2   tty26  tty32  tty39  tty45  tty51  tty58  tty7
tty  tty14  tty20  tty27  tty33  tty4   tty46  tty52  tty59  tty8
tty0 tty15  tty21  tty28  tty34  tty40  tty47  tty53  tty6   tty9
tty1 tty16  tty22  tty29  tty35  tty41  tty48  tty54  tty60
tty10tty17  tty23  tty3   tty36  tty42  tty49  tty55  tty61
tty11tty18  tty24  tty30  tty37  tty43  tty5   tty56  tty62

I suggest you take a close look at the kernel before making statements
like the above :)

 These things could also, in future, have their major and minor numbers 
 dynamically (even randomly) assigned.  That's been discussed on this list.

I tried that once, it will require some core api kernel changes and a
lot of infrastrucure work to get that to work properly.  Not that it
will never happen in the future, but it's just not a trivial change at
the moment...

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Greg KH
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote:
 I'm not trying to document /sys/devices.  I'm trying to document hotplug, 
 populating /dev, and things like firmware loading that fall out of that.  
 This requires use of sysfs, and I'm only trying to document as much of sysfs 
 as you need to do that.

Like I stated before, you do not need to even have sysfs mounted to have
a dynamic /dev.

And why do you need to document populating /dev dynamically?  udev
already solves this problem for you, it's not like people are going off
and reinventing udev for their own enjoyment would not at least look at
how it solves this problem first.

To do otherwise would be foolish :)

Firmware loading is fine to document if you wish to do so.  But again,
why?  We already have multiple userspace programs that provide this
feature for them.  Perhaps you want to document how to add firmware to a
system in order for these different programs to pick them up?

Or perhaps you want to document how to add this kind of functionality to
your kernel driver so that it can handle firmware loading by using the
firmware interface that the kernel provides?

If you just want to document the hotplug/uevent api, then do just that.
However I think you are overreaching with your scope here and getting
mighty confused in the process.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Greg KH
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote:
   Always look at the parent devices themselves for determining device
   context properties.
 
 For determining?
 
 What was the original language of this document?

Ok, that's just being mean, cut it out right now if you ever want my
help again.

I'll gladly accept patches for this document that is in the kernel tree
now if you want to send them.  But criticizing the grammer of a document
with statements like this one gets you no where and is damm rude.

I suggest you start this thread over if you want my feedback, I'm not
going to respond anymore to this one.

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use the tsk argument in init_new_context()

2007-07-20 Thread Diego Woitasen
On Thu, Jul 19, 2007 at 05:42:38PM -0700, Andrew Morton wrote:
 On Sun,  8 Jul 2007 22:55:08 -0300
 Diego Woitasen [EMAIL PROTECTED] wrote:
 
  Signed-off-by: Diego Woitasen [EMAIL PROTECTED]
  ---
   arch/i386/kernel/ldt.c   |2 +-
   arch/x86_64/kernel/ldt.c |2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/arch/i386/kernel/ldt.c b/arch/i386/kernel/ldt.c
  index e0b2d17..c2eb4fb 100644
  --- a/arch/i386/kernel/ldt.c
  +++ b/arch/i386/kernel/ldt.c
  @@ -96,7 +96,7 @@ int init_new_context(struct task_struct *tsk, struct 
  mm_struct *mm)
   
  init_MUTEX(mm-context.sem);
  mm-context.size = 0;
  -   old_mm = current-mm;
  +   old_mm = tsk-mm;
  if (old_mm  old_mm-context.size  0) {
  down(old_mm-context.sem);
  retval = copy_ldt(mm-context, old_mm-context);
  diff --git a/arch/x86_64/kernel/ldt.c b/arch/x86_64/kernel/ldt.c
  index bc9ffd5..99a92ed 100644
  --- a/arch/x86_64/kernel/ldt.c
  +++ b/arch/x86_64/kernel/ldt.c
  @@ -100,7 +100,7 @@ int init_new_context(struct task_struct *tsk, struct 
  mm_struct *mm)
   
  init_MUTEX(mm-context.sem);
  mm-context.size = 0;
  -   old_mm = current-mm;
  +   old_mm = tsk-mm;
  if (old_mm  old_mm-context.size  0) {
  down(old_mm-context.sem);
  retval = copy_ldt(mm-context, old_mm-context);
 
 
 When called from dup_mm(), `tsk' refers to the new task and `current'
 refers to the old one.  I'd have expected this to crash during your testing?
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

Yes, sorry... that patch is bad. Now my question is, why all
architectures have the task argument and neither use it? I undertand now
that init_new_context() work with current but what about the *tsk arg.



-- 

--
Diego Woitasen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Gabriel C
Thomas Gleixner wrote:


[...]

 As usual, comments and suggestions are welcome!


Compiles and boots fine here ( on my Dell Precision WorkStation 530 MT ). And 
nothing broke so far.

I only got some Kconfig warnings[1] with my config[2] but that is.

( I don't know whatever this matter but it boots 7,52 seconds faster as current 
git head )

[1]http://194.231.229.228/linux-x86/warning
[2]http://194.231.229.228/linux-x86/config-x86

 
   Thomas, Ingo
 
 


Regards,

Gabriel C
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Greg KH
On Sat, Jul 21, 2007 at 02:28:52AM +0200, Michal Piotrowski wrote:
  On 21/07/07, Satyam Sharma [EMAIL PROTECTED] wrote:
  Oh, which means ...
 
 
  On 7/21/07, Satyam Sharma [EMAIL PROTECTED] wrote:
   On 7/21/07, Greg KH [EMAIL PROTECTED] wrote:
On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote:
 On Fri, 20 Jul 2007 15:50:47 -0700
 Greg KH [EMAIL PROTECTED] wrote:

  On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote:
Hi Greg,
  
This looks like a sysfs bug
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
broken-out-2007-07-20-00-22/3.jpg
  
l *kernel_param_sysfs_setup+0x75
0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570).
565 mk-mod = THIS_MODULE;
566 kobj_set_kset_s(mk, module_subsys);
 
567 kobject_set_name(mk-kobj, name);
 
  Shouldn't the return of kobject_set_name() be checked here?
 
  [ Looking at code, and realizing that kobject_set_name() manages to
  succeed even when given a null string! ]
 
568 kobject_init(mk-kobj);
569 ret = kobject_add(mk-kobj);
570 BUG_ON(ret  0);
571 param_sysfs_setup(mk, kparam, num_params, 
  name_skip);
572 kobject_uevent(mk-kobj, KOBJ_ADD);
573 }
574
  
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/
broken-out-2007-07-20-00-22/mm-config
 
  What kernel version is this happening on?  The -mm tree?  Can you 
  try
  Linus's tree instead?
 
  It looks like there was some needed information right before the 
  first
  stack dump, showing exactly what kobject was trying to be added 
  that was
  already present.  Odds are this is a kernel parameter with the same 
  name
  as a duplicate one within the same module,
  
   I don't think that's an -EEXIST.
  
   I think what we have here is kobject_add() exiting with -EINVAL.
   (kobject attempted to be registered with no name!)
  
   [ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189.
   That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case,
   we would've seen an offset in kobject_shadow_add closer to 0x189,
   because the dump_stack() for EEXIST is barely 4 instructions before
   we return from that function. ]
  
  but the trick is going to be
  trying to figure out what module is causing this.
  
   So I'd guess we want to search for a module that's passing a kobject *
   to kobject_add() such that !kobj-k_name is true.
 
  Oh, that's kernel_param_sysfs_setup itself. So we actually need to
  search for a built-in module in Michal's config that ... has an ... empty
   modname !?
 
  I'll try to figure out this

Try the patch below to help you boot and figure out what went wrong.

Post the kernel log results and I'll try to help you out.

thanks,

greg k-h

---
 kernel/params.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/kernel/params.c
+++ b/kernel/params.c
@@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
kobject_set_name(mk-kobj, name);
kobject_init(mk-kobj);
ret = kobject_add(mk-kobj);
-   BUG_ON(ret  0);
+   if (ret) {
+   printk(KERN_ERR module '%s' failed to be added to sysfs, 
+   the system will be unstable now.\n, name);
+   return;
+   }
param_sysfs_setup(mk, kparam, num_params, name_skip);
kobject_uevent(mk-kobj, KOBJ_ADD);
 }

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


film at 11: kernel update breaks udev.

2007-07-20 Thread Dave Jones
Just one of my machines to 2.6.22.1, and got this during boot..

Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) 
failed: File exists

Under 2.6.21, all was fine.

sdc is one disk of a 3 disk raid5 set.
The raidset still manages to come up despite this.

This is a Fedora 7 box, with udev-106-4.1.fc7

What changed this time?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

Nishanth Aravamudan wrote:

On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote:


On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty [EMAIL PROTECTED] wrote:



+   }
+
+   offset += ret;
+   retval += ret;
+   len -= ret;
+   index += offset  HPAGE_SHIFT;
+   offset = ~HPAGE_MASK;
+
+   page_cache_release(page);
+   if (ret == nr  len)
+   continue;
+   goto out;
+   }
+out:
+   return retval;
+}


This code doesn't have all the ghastly tricks which we deploy to
handle concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 


Nick, can you think of any serious consequences of a read/truncate
race in there?  I can't..



All I want is a simple read() to get my oprofile working.  Please
advise.


Did you consider changing oprofile userspace to read the executable
with mmap?



It's not actually oprofile's code, though, it's libbfd (used by
oprofile). And it works fine (presumably) for other binaries.


So... what's the problem with changing it? The fact that it is a
library doesn't really make a difference except that you'll also
help everyone else who links with it.

It won't break backwards compatibility, and it will work on older
kernels...

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Kay Sievers

On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:

Just one of my machines to 2.6.22.1, and got this during boot..

Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) 
failed: File exists

Under 2.6.21, all was fine.

sdc is one disk of a 3 disk raid5 set.
The raidset still manages to come up despite this.

This is a Fedora 7 box, with udev-106-4.1.fc7

What changed this time?


CONFIG_BLK_DEV_BSG=y?

There's a name-clash, because bsg tries to create devices with the same name.
James sent a patch, it's on lkml.

Kay
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: where is the code for read system call?

2007-07-20 Thread Karsten Wiese
Am Samstag, 21. Juli 2007 schrieb Agarwal, Lomesh:
 My application reads from socket. I need to change the behavior of read
 system call for an experiment. Can someone point me to code?

fs/read_write.c: line 356
asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Dave Jones
On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
  On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
   Just one of my machines to 2.6.22.1, and got this during boot..
  
   Starting udev: udevd-event[619]: udev_node_symlink: 
   symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) 
   failed: File exists
  
   Under 2.6.21, all was fine.
  
   sdc is one disk of a 3 disk raid5 set.
   The raidset still manages to come up despite this.
  
   This is a Fedora 7 box, with udev-106-4.1.fc7
  
   What changed this time?
  
  CONFIG_BLK_DEV_BSG=y?
  
  There's a name-clash, because bsg tries to create devices with the same name.
  James sent a patch, it's on lkml.

BSG isn't in 2.6.22

Dave
 
-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/7] lguest: documentation pt I: Preparation

2007-07-20 Thread Rusty Russell
The netfilter code had very good documentation: the Netfilter Hacking
HOWTO.  Noone ever read it.

So this time I'm trying something different, using a bit of
Knuthiness.  Start with drivers/lguest/README.

Signed-off-by: Rusty Russell [EMAIL PROTECTED]
---
 Documentation/lguest/extract  |   58 +
 Documentation/lguest/lguest.c |9 +++--
 drivers/lguest/Makefile   |   12 ++
 drivers/lguest/README |   47 ++
 drivers/lguest/core.c |7 ++-
 drivers/lguest/hypercalls.c   |9 +++--
 drivers/lguest/interrupts_and_traps.c |   13 +++
 drivers/lguest/io.c   |8 +++-
 drivers/lguest/lguest.c   |   30 +++--
 drivers/lguest/lguest_bus.c   |3 +
 drivers/lguest/lguest_user.c  |7 +++
 drivers/lguest/page_tables.c  |   10 -
 drivers/lguest/segments.c |   11 ++
 drivers/lguest/switcher.S |   13 +++
 14 files changed, 218 insertions(+), 19 deletions(-)

===
--- /dev/null
+++ b/Documentation/lguest/extract
@@ -0,0 +1,58 @@
+#! /bin/sh
+
+set -e
+
+PREFIX=$1
+shift
+
+trap 'rm -r $TMPDIR' 0
+TMPDIR=`mktemp -d`
+
+exec 3/dev/null
+for f; do
+while IFS=
+ read -r LINE; do
+   case $LINE in
+   *$PREFIX:[0-9]*:\**)
+   NUM=`echo $LINE | sed s/.*$PREFIX:\([0-9]*\).*/\1/`
+   if [ -f $TMPDIR/$NUM ]; then
+   echo $TMPDIR/$NUM already exits prior to $f
+   exit 1
+   fi
+   exec 3$TMPDIR/$NUM
+   echo $f | sed 's,\.\./,,g'  $TMPDIR/.$NUM
+   /bin/echo $LINE | sed -e s/$PREFIX:[0-9]*// -e s/:\*/*/ 
3
+   ;;
+   *$PREFIX:[0-9]*)
+   NUM=`echo $LINE | sed s/.*$PREFIX:\([0-9]*\).*/\1/`
+   if [ -f $TMPDIR/$NUM ]; then
+   echo $TMPDIR/$NUM already exits prior to $f
+   exit 1
+   fi
+   exec 3$TMPDIR/$NUM
+   echo $f | sed 's,\.\./,,g'  $TMPDIR/.$NUM
+   /bin/echo $LINE | sed s/$PREFIX:[0-9]*// 3
+   ;;
+   *:\**)
+   /bin/echo $LINE | sed -e s/:\*/*/ -e s,/\*\*/,, 3
+   echo 3
+   exec 3/dev/null
+   ;;
+   *)
+   /bin/echo $LINE 3
+   ;;
+   esac
+done  $f
+echo 3
+exec 3/dev/null
+done
+
+LASTFILE=
+for f in $TMPDIR/*; do
+if [ $LASTFILE != $(cat $TMPDIR/.$(basename $f) ) ]; then
+   LASTFILE=$(cat $TMPDIR/.$(basename $f) )
+   echo [ $LASTFILE ]
+fi
+cat $f
+done
+
===
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -1,5 +1,10 @@
-/* Simple program to layout physical memory for new lguest guest.
- * Linked high to avoid likely physical memory.  */
+/*P:100 This is the Launcher code, a simple program which lays out the
+ * physical memory for the new Guest by mapping the kernel image and the
+ * virtual devices, then reads repeatedly from /dev/lguest to run the Guest.
+ *
+ * The only trick: the Makefile links it statically at a high address, so it
+ * will be clear of the guest memory region.  It means that each Guest cannot
+ * have more than 2.5G of memory on a normally configured Host. :*/
 #define _LARGEFILE64_SOURCE
 #define _GNU_SOURCE
 #include stdio.h
===
--- a/drivers/lguest/Makefile
+++ b/drivers/lguest/Makefile
@@ -5,3 +5,15 @@ obj-$(CONFIG_LGUEST)   += lg.o
 obj-$(CONFIG_LGUEST)   += lg.o
 lg-y := core.o hypercalls.o page_tables.o interrupts_and_traps.o \
segments.o io.o lguest_user.o switcher.o
+
+Preparation Preparation!: PREFIX=P
+Guest: PREFIX=G
+Drivers: PREFIX=D
+Launcher: PREFIX=L
+Host: PREFIX=H
+Switcher: PREFIX=S
+Mastery: PREFIX=M
+Beer:
+   @for f in Preparation Guest Drivers Launcher Host Switcher Mastery; do 
echo {==- $$f -==}; make -s $$f; done; echo {==-==}
+Preparation Preparation! Guest Drivers Launcher Host Switcher Mastery:
+   @sh ../../Documentation/lguest/extract $(PREFIX) `find ../../* -name 
'*.[chS]' -wholename '*lguest*'`
===
--- /dev/null
+++ b/drivers/lguest/README
@@ -0,0 +1,47 @@
+Welcome, friend reader, to lguest.
+
+Lguest is an adventure, with you, the reader, as Hero.  I can't think of many
+5000-line projects which offer both such capability and glimpses of future
+potential; it is an exciting time to be delving into the source!
+
+But be warned; this is an arduous journey of several hours or more!  And as we
+know, all true Heroes are driven by a Noble Goal.  Thus I offer a Beer (or
+equivalent) to anyone I meet who has completed this documentation.
+
+So get 

[PATCH 2/7] lguest: documentation pt II: Guest

2007-07-20 Thread Rusty Russell
Documentation: The Guest

Signed-off-by: Rusty Russell [EMAIL PROTECTED]

---
 drivers/lguest/lguest.c |  458 ---
 drivers/lguest/lguest_asm.S |   57 +++--
 include/linux/lguest.h  |   47 +++-
 3 files changed, 512 insertions(+), 50 deletions(-)

===
--- a/drivers/lguest/lguest.c
+++ b/drivers/lguest/lguest.c
@@ -66,6 +66,12 @@
 #include asm/mce.h
 #include asm/io.h
 
+/*G:010 Welcome to the Guest!
+ *
+ * The Guest in our tale is a simple creature: identical to the Host but
+ * behaving in simplified but equivalent ways.  In particular, the Guest is the
+ * same kernel as the Host (or at least, built from the same source code). :*/
+
 /* Declarations for definitions in lguest_guest.S */
 extern char lguest_noirq_start[], lguest_noirq_end[];
 extern const char lgstart_cli[], lgend_cli[];
@@ -84,7 +90,26 @@ struct lguest_device_desc *lguest_device
 struct lguest_device_desc *lguest_devices;
 static cycle_t clock_base;
 
-static enum paravirt_lazy_mode lazy_mode;
+/*G:035 Notice the lazy_hcall() above, rather than hcall().  This is our first
+ * real optimization trick!
+ *
+ * When lazy_mode is set, it means we're allowed to defer all hypercalls and do
+ * them as a batch when lazy_mode is eventually turned off.  Because hypercalls
+ * are reasonably expensive, batching them up makes sense.  For example, a
+ * large mmap might update dozens of page table entries: that code calls
+ * lguest_lazy_mode(PARAVIRT_LAZY_MMU), does the dozen updates, then calls
+ * lguest_lazy_mode(PARAVIRT_LAZY_NONE).
+ *
+ * So, when we're in lazy mode, we call async_hypercall() to store the call for
+ * future processing.  When lazy mode is turned off we issue a hypercall to
+ * flush the stored calls.
+ *
+ * There's also a hack where mode is set to PARAVIRT_LAZY_FLUSH which
+ * indicates we're to flush any outstanding calls immediately.  This is used
+ * when an interrupt handler does a kmap_atomic(): the page table changes must
+ * happen immediately even if we're in the middle of a batch.  Usually we're
+ * not, though, so there's nothing to do. */
+static enum paravirt_lazy_mode lazy_mode; /* Note: not SMP-safe! */
 static void lguest_lazy_mode(enum paravirt_lazy_mode mode)
 {
if (mode == PARAVIRT_LAZY_FLUSH) {
@@ -108,6 +133,16 @@ static void lazy_hcall(unsigned long cal
async_hcall(call, arg1, arg2, arg3);
 }
 
+/* async_hcall() is pretty simple: I'm quite proud of it really.  We have a
+ * ring buffer of stored hypercalls which the Host will run though next time we
+ * do a normal hypercall.  Each entry in the ring has 4 slots for the hypercall
+ * arguments, and a hcall_status word which is 0 if the call is ready to go,
+ * and 255 once the Host has finished with it.
+ *
+ * If we come around to a slot which hasn't been finished, then the table is
+ * full and we just make the hypercall directly.  This has the nice side
+ * effect of causing the Host to run all the stored calls in the ring buffer
+ * which empties it for next time! */
 void async_hcall(unsigned long call,
 unsigned long arg1, unsigned long arg2, unsigned long arg3)
 {
@@ -115,6 +150,9 @@ void async_hcall(unsigned long call,
static unsigned int next_call;
unsigned long flags;
 
+   /* Disable interrupts if not already disabled: we don't want an
+* interrupt handler making a hypercall while we're already doing
+* one! */
local_irq_save(flags);
if (lguest_data.hcall_status[next_call] != 0xFF) {
/* Table full, so do normal hcall which will flush table. */
@@ -124,7 +162,7 @@ void async_hcall(unsigned long call,
lguest_data.hcalls[next_call].edx = arg1;
lguest_data.hcalls[next_call].ebx = arg2;
lguest_data.hcalls[next_call].ecx = arg3;
-   /* Make sure host sees arguments before valid flag. */
+   /* Arguments must all be written before we mark it to go */
wmb();
lguest_data.hcall_status[next_call] = 0;
if (++next_call == LHCALL_RING_SIZE)
@@ -132,9 +170,14 @@ void async_hcall(unsigned long call,
}
local_irq_restore(flags);
 }
-
+/*:*/
+
+/* Wrappers for the SEND_DMA and BIND_DMA hypercalls.  This is mainly because
+ * Jeff Garzik complained that __pa() should never appear in drivers, and this
+ * helps remove most of them.   But also, it wraps some ugliness. */
 void lguest_send_dma(unsigned long key, struct lguest_dma *dma)
 {
+   /* The hcall might not write this if something goes wrong */
dma-used_len = 0;
hcall(LHCALL_SEND_DMA, key, __pa(dma), 0);
 }
@@ -142,11 +185,16 @@ int lguest_bind_dma(unsigned long key, s
 int lguest_bind_dma(unsigned long key, struct lguest_dma *dmas,
unsigned int num, u8 irq)
 {
+   /* This is the only hypercall which actually wants 5 

[PATCH 3/7] lguest: documentation pt III: Drivers

2007-07-20 Thread Rusty Russell
Documentation: The Drivers

Signed-off-by: Rusty Russell [EMAIL PROTECTED]

---
 drivers/block/lguest_blk.c  |  171 +++---
 drivers/char/hvc_lguest.c   |   77 +
 drivers/lguest/lguest_bus.c |   72 
 drivers/net/lguest_net.c|  222 +++
 include/linux/lguest_bus.h  |5 
 include/linux/lguest_launcher.h |   60 ++
 6 files changed, 565 insertions(+), 42 deletions(-)

===
--- a/drivers/block/lguest_blk.c
+++ b/drivers/block/lguest_blk.c
@@ -1,6 +1,12 @@
-/* A simple block driver for lguest.
- *
- * Copyright 2006 Rusty Russell [EMAIL PROTECTED] IBM Corporation
+/*D:400
+ * The Guest block driver
+ *
+ * This is a simple block driver, which appears as /dev/lgba, lgbb, lgbc etc.
+ * The mechanism is simple: we place the information about the request in the
+ * device page, then use SEND_DMA (containing the data for a write, or an empty
+ * ping DMA for a read).
+ :*/
+/* Copyright 2006 Rusty Russell [EMAIL PROTECTED] IBM Corporation
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -25,27 +31,50 @@
 
 static char next_block_index = 'a';
 
+/*D:420 Here is the structure which holds all the information we need about
+ * each Guest block device.
+ *
+ * I'm sure at this stage, you're wondering hey, where was the adventure I was
+ * promised? and thinking Rusty sucks, I shall say nasty things about him on
+ * my blog.  I think Real adventures have boring bits, too, and you're in the
+ * middle of one.  But it gets better.  Just not quite yet. */
 struct blockdev
 {
+   /* The block queue infrastructure wants a spinlock: it is held while it
+* calls our block request function.  We grab it in our interrupt
+* handler so the responses don't mess with new requests. */
spinlock_t lock;
 
-   /* The disk structure for the kernel. */
+   /* The disk structure registered with kernel. */
struct gendisk *disk;
 
-   /* The major number for this disk. */
+   /* The major device number for this disk, and the interrupt.  We only
+* really keep them here for completeness; we'd need them if we
+* supported device unplugging. */
int major;
int irq;
 
+   /* The physical address of this device's memory page */
unsigned long phys_addr;
-   /* The mapped block page. */
+   /* The mapped memory page for convenient acces. */
struct lguest_block_page *lb_page;
 
-   /* We only have a single request outstanding at a time. */
+   /* We only have a single request outstanding at a time: this is it. */
struct lguest_dma dma;
struct request *req;
 };
 
-/* Jens gave me this nice helper to end all chunks of a request. */
+/*D:495 We originally used end_request() throughout the driver, but it turns
+ * out that end_request() is deprecated, and doesn't actually end the request
+ * (which seems like a good reason to deprecate it!).  It simply ends the first
+ * bio.  So if we had 3 bios in a struct request we would do all 3,
+ * end_request(), do 2, end_request(), do 1 and end_request(): twice as much
+ * work as we needed to do.
+ *
+ * This reinforced to me that I do not understand the block layer.
+ *
+ * Nonetheless, Jens Axboe gave me this nice helper to end all chunks of a
+ * request.  This improved disk speed by 130%. */
 static void end_entire_request(struct request *req, int uptodate)
 {
if (end_that_request_first(req, uptodate, req-hard_nr_sectors))
@@ -55,30 +84,62 @@ static void end_entire_request(struct re
end_that_request_last(req, uptodate);
 }
 
+/* I'm told there are only two stories in the world worth telling: love and
+ * hate.  So there used to be a love scene here like this:
+ *
+ *  Launcher:  We could make beautiful I/O together, you and I.
+ *  Guest: My, that's a big disk!
+ *
+ * Unfortunately, it was just too raunchy for our otherwise-gentle tale. */
+
+/*D:490 This is the interrupt handler, called when a block read or write has
+ * been completed for us. */
 static irqreturn_t lgb_irq(int irq, void *_bd)
 {
+   /* We handed our struct blockdev as the argument to request_irq(), so
+* it is passed through to us here.  This tells us which device we're
+* dealing with in case we have more than one. */
struct blockdev *bd = _bd;
unsigned long flags;
 
+   /* We weren't doing anything?  Strange, but could happen if we shared
+* interrupts (we don't!). */
if (!bd-req) {
pr_debug(No work!\n);
return IRQ_NONE;
}
 
+   /* Not done yet?  That's equally strange. */
if (!bd-lb_page-result) {
pr_debug(No result!\n);
return IRQ_NONE;
}
 
+   /* We have 

[PATCH 6/7] lguest: documentation pt VI: Switcher

2007-07-20 Thread Rusty Russell
Documentation: The Switcher

Signed-off-by: Rusty Russell [EMAIL PROTECTED]

---
 drivers/lguest/core.c |   51 +++-
 drivers/lguest/switcher.S |  271 ++---
 2 files changed, 276 insertions(+), 46 deletions(-)

===
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -394,46 +394,89 @@ static void set_ts(void)
write_cr0(cr0|8);
 }
 
+/*S:010
+ * We are getting close to the Switcher.
+ *
+ * Remember that each CPU has two pages which are visible to the Guest when it
+ * runs on that CPU.  This has to contain the state for that Guest: we copy the
+ * state in just before we run the Guest.
+ *
+ * Each Guest has changed flags which indicate what has changed in the Guest
+ * since it last ran.  We saw this set in interrupts_and_traps.c and
+ * segments.c.
+ */
 static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages)
 {
+   /* Copying all this data can be quite expensive.  We usually run the
+* same Guest we ran last time (and that Guest hasn't run anywhere else
+* meanwhile).  If that's not the case, we pretend everything in the
+* Guest has changed. */
if (__get_cpu_var(last_guest) != lg || lg-last_pages != pages) {
__get_cpu_var(last_guest) = lg;
lg-last_pages = pages;
lg-changed = CHANGED_ALL;
}
 
-   /* These are pretty cheap, so we do them unconditionally. */
+   /* These copies are pretty cheap, so we do them unconditionally: */
+   /* Save the current Host top-level page directory. */
pages-state.host_cr3 = __pa(current-mm-pgd);
+   /* Set up the Guest's page tables to see this CPU's pages (and no
+* other CPU's pages). */
map_switcher_in_guest(lg, pages);
+   /* Set up the two TSS members which tell the CPU what stack to use
+* for traps which do directly into the Guest (ie. traps at privilege
+* level 1). */
pages-state.guest_tss.esp1 = lg-esp1;
pages-state.guest_tss.ss1 = lg-ss1;
 
-   /* Copy direct trap entries. */
+   /* Copy direct-to-Guest trap entries. */
if (lg-changed  CHANGED_IDT)
copy_traps(lg, pages-state.guest_idt, default_idt_entries);
 
-   /* Copy all GDT entries but the TSS. */
+   /* Copy all GDT entries which the Guest can change. */
if (lg-changed  CHANGED_GDT)
copy_gdt(lg, pages-state.guest_gdt);
/* If only the TLS entries have changed, copy them. */
else if (lg-changed  CHANGED_GDT_TLS)
copy_gdt_tls(lg, pages-state.guest_gdt);
 
+   /* Mark the Guest as unchanged for next time. */
lg-changed = 0;
 }
 
+/* Finally: the code to actually call into the Switcher to run the Guest. */
 static void run_guest_once(struct lguest *lg, struct lguest_pages *pages)
 {
+   /* This is a dummy value we need for GCC's sake. */
unsigned int clobber;
 
+   /* Copy the guest-specific information into this CPU's struct
+* lguest_pages. */
copy_in_guest_info(lg, pages);
 
-   /* Put eflags on stack, lcall does rest: suitable for iret return. */
+   /* Now: we push the eflags register on the stack, then do an lcall.
+* This is how we change from using the kernel code segment to using
+* the dedicated lguest code segment, as well as jumping into the
+* Switcher.
+*
+* The lcall also pushes the old code segment (KERNEL_CS) onto the
+* stack, then the address of this call.  This stack layout happens to
+* exactly match the stack of an interrupt... */
asm volatile(pushf; lcall *lguest_entry
+/* This is how we tell GCC that %eax (a) and %ebx (b)
+ * are changed by this routine.  The = means output. */
 : =a(clobber), =b(clobber)
+/* %eax contains the pages pointer.  (0 refers to the
+ * 0-th argument above, ie a).  %ebx contains the
+ * physical address of the Guest's top-level page
+ * directory. */
 : 0(pages), 1(__pa(lg-pgdirs[lg-pgdidx].pgdir))
+/* We tell gcc that all these registers could change,
+ * which means we don't have to save and restore them in
+ * the Switcher. */
 : memory, %edx, %ecx, %edi, %esi);
 }
+/*:*/
 
 /*H:030 Let's jump straight to the the main loop which runs the Guest.
  * Remember, this is called by the Launcher reading /dev/lguest, and we keep
===
--- a/drivers/lguest/switcher.S
+++ b/drivers/lguest/switcher.S
@@ -6,41 +6,131 @@
  * are feeling invigorated and refreshed then the next, more challenging stage
  * can be found in make Guest. :*/
 
+/*S:100

Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

Andrew Morton wrote:

On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty [EMAIL PROTECTED] wrote:



This code doesn't have all the ghastly tricks which we deploy to handle
concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 



Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..


As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But I'm not completely sure.. we've had a lot of (and still have
some known and probably unknown) bugs just in that single
generic_mapping_read function, most of which are due to our rabid
aversion to doing any locking whatsoever there.

So why not just hold i_mutex around the whole thing to be safe?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] lguest: documentation pt VII: FIXMEs

2007-07-20 Thread Rusty Russell
Documentation: The FIXMEs

Signed-off-by: Rusty Russell [EMAIL PROTECTED]

---
 Documentation/lguest/lguest.c |   12 
 drivers/char/hvc_lguest.c |3 +++
 drivers/lguest/interrupts_and_traps.c |   14 ++
 drivers/lguest/io.c   |   10 ++
 drivers/lguest/lguest.c   |8 
 drivers/lguest/lguest_asm.S   |   14 ++
 drivers/lguest/page_tables.c  |5 +
 drivers/lguest/segments.c |4 
 drivers/net/lguest_net.c  |   19 +++
 9 files changed, 89 insertions(+)

===
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -1536,3 +1536,15 @@ int main(int argc, char *argv[])
/* Finally, run the Guest.  This doesn't return. */
run_guest(lguest_fd, device_list);
 }
+/*:*/
+
+/*M:999
+ * Mastery is done: you now know everything I do.
+ *
+ * But surely you have seen code, features and bugs in your wanderings which
+ * you now yearn to attack?  That is the real game, and I look forward to you
+ * patching and forking lguest into the Your-Name-Here-visor.
+ *
+ * Farewell, and good coding!
+ * Rusty Russell.
+ */
===
--- a/drivers/char/hvc_lguest.c
+++ b/drivers/char/hvc_lguest.c
@@ -13,6 +13,9 @@
  * functions.
  :*/
 
+/*M:002 The console can be flooded: while the Guest is processing input the
+ * Host can send more.  Buffering in the Host could alleviate this, but it is a
+ * difficult problem in general. :*/
 /* Copyright (C) 2006 Rusty Russell, IBM Corporation
  *
  * This program is free software; you can redistribute it and/or modify
===
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -231,6 +231,20 @@ static int direct_trap(const struct lgue
 * go direct, of course 8) */
return idt_type(trap-a, trap-b) == 0xF;
 }
+/*:*/
+
+/*M:005 The Guest has the ability to turn its interrupt gates into trap gates,
+ * if it is careful.  The Host will let trap gates can go directly to the
+ * Guest, but the Guest needs the interrupts atomically disabled for an
+ * interrupt gate.  It can do this by pointing the trap gate at instructions
+ * within noirq_start and noirq_end, where it can safely disable interrupts. */
+
+/*M:006 The Guests do not use the sysenter (fast system call) instruction,
+ * because it's hardcoded to enter privilege level 0 and so can't go direct.
+ * It's about twice as fast as the older int 0x80 system call, so it might
+ * still be worthwhile to handle it in the Switcher and lcall down to the
+ * Guest.  The sysenter semantics are hairy tho: search for that keyword in
+ * entry.S :*/
 
 /*H:260 When we make traps go directly into the Guest, we need to make sure
  * the kernel stack is valid (ie. mapped in the page tables).  Otherwise, the
===
--- a/drivers/lguest/io.c
+++ b/drivers/lguest/io.c
@@ -553,6 +553,16 @@ void release_all_dma(struct lguest *lg)
up_read(lg-mm-mmap_sem);
 }
 
+/*M:007 We only return a single DMA buffer to the Launcher, but it would be
+ * more efficient to return a pointer to the entire array of DMA buffers, which
+ * it can cache and choose one whenever it wants.
+ *
+ * Currently the Launcher uses a write to /dev/lguest, and the return value is
+ * the address of the DMA structure with the interrupt number placed in
+ * dma-used_len.  If we wanted to return the entire array, we need to return
+ * the address, array size and interrupt number: this seems to require an
+ * ioctl(). :*/
+
 /*L:320 This routine looks for a DMA buffer registered by the Guest on the
  * given key (using the BIND_DMA hypercall). */
 unsigned long get_dma_buffer(struct lguest *lg,
===
--- a/drivers/lguest/lguest.c
+++ b/drivers/lguest/lguest.c
@@ -251,6 +251,14 @@ static void irq_enable(void)
 {
lguest_data.irq_enabled = X86_EFLAGS_IF;
 }
+/*:*/
+/*M:003 Note that we don't check for outstanding interrupts when we re-enable
+ * them (or when we unmask an interrupt).  This seems to work for the moment,
+ * since interrupts are rare and we'll just get the interrupt on the next timer
+ * tick, but now we have CONFIG_NO_HZ, we should revisit this.  One way
+ * would be to put the irq_enabled field in a page by itself, and have the
+ * Host write-protect it when an interrupt comes in when irqs are disabled.
+ * There will then be a page fault as soon as interrupts are re-enabled. :*/
 
 /*G:034
  * The Interrupt Descriptor Table (IDT).
===
--- a/drivers/lguest/lguest_asm.S
+++ b/drivers/lguest/lguest_asm.S
@@ -41,6 +41,20 @@ LGUEST_PATCH(pushf, movl 

Re: [PATCH] AFS: Fix file locking

2007-07-20 Thread Nick Piggin

Andrew Morton wrote:

On Wed, 18 Jul 2007 15:56:53 +1000 Nick Piggin [EMAIL PROTECTED] wrote:



Andrew Morton wrote:


On Tue, 17 Jul 2007 13:47:32 +0100
David Howells [EMAIL PROTECTED] wrote:




+   if (type == AFS_LOCK_READ 
+   vnode-flags  (1  AFS_VNODE_READLOCKED)) {



Here we use

vnode-flags  (1  foo)




+   set_bit(AFS_VNODE_LOCKING, vnode-flags);



and elsewhere we use set_bit(foo, vnode-flags) and clear_bit()

This is a bit strange.  Does the open-coded bit-test have any performance
benefit on any architecture?  Not on x86 at least, afaik.


It uses locked operations on x86, but you can use __set_bit instead
(which should always be at least as efficient as the C version).



I said bit-test.  ie: test_bit().  That doesn't use a locked operation.


So you did. Then to answer that, yes it could be faster because there are
stupid volatiles sprinkled all over the bitops code so you could easily
end up having to do more loads. Does it make a real difference? Unlikely,
but David loves counting cycles :)

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Kay Sievers

On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:

On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
  On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
   Just one of my machines to 2.6.22.1, and got this during boot..
  
   Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File 
exists
  
   Under 2.6.21, all was fine.
  
   sdc is one disk of a 3 disk raid5 set.
   The raidset still manages to come up despite this.
  
   This is a Fedora 7 box, with udev-106-4.1.fc7
  
   What changed this time?
 
  CONFIG_BLK_DEV_BSG=y?
 
  There's a name-clash, because bsg tries to create devices with the same name.
  James sent a patch, it's on lkml.

BSG isn't in 2.6.22


Ok. There has nothing else changed, that I could think of what could cause this.

The code in udev that prints this message looks like:
  err(symlink(%s, %s) failed: %s, linktarget, filename, strerror(errno));

That doesn't really match what you posted. Are there chars missing?
Can you please recheck?

And what does:
 udevtest /block/sdc
print?

Thanks,
Kay
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] i386: use x86_64's desc_def.h

2007-07-20 Thread Chris Wright
* Rusty Russell ([EMAIL PROTECTED]) wrote:
 On Thu, 2007-07-19 at 09:27 +1000, Rusty Russell wrote:
  On Wed, 2007-07-18 at 09:19 -0700, Zachary Amsden wrote:
+#define GET_CONTENTS(desc) (((desc)-raw32.b  10)  3)
+#define GET_WRITABLE(desc) (((desc)-raw32.b   9)  1)
   
   You got rid of the duplicate definitions here, but then added new 
   duplicates (GET_CONTENTS / WRITABLE).  Can you stick them in desc.h?
  
  To be honest, I got sick of counting bits at this point, and didn't want
  to introduce bugs.
  
  Here's the updated version of PATCH 1/3:
 
 And 2/3:
 ===
 i386: use x86_64's desc_def.h

plus this needed as well now

Index: linus-2.6/include/asm-i386/xen/hypercall.h
===
--- linus-2.6.orig/include/asm-i386/xen/hypercall.h
+++ linus-2.6/include/asm-i386/xen/hypercall.h
@@ -359,8 +359,8 @@ MULTI_update_descriptor(struct multicall
mcl-op = __HYPERVISOR_update_descriptor;
mcl-args[0] = maddr;
mcl-args[1] = maddr  32;
-   mcl-args[2] = desc.a;
-   mcl-args[3] = desc.b;
+   mcl-args[2] = desc.raw32.a;
+   mcl-args[3] = desc.raw32.b;
 }
 
 static inline void
Index: linus-2.6/drivers/lguest/interrupts_and_traps.c
===
--- linus-2.6.orig/drivers/lguest/interrupts_and_traps.c
+++ linus-2.6/drivers/lguest/interrupts_and_traps.c
@@ -103,9 +103,9 @@ void maybe_do_interrupt(struct lguest *l
}
 
idt = lg-idt[FIRST_EXTERNAL_VECTOR+irq];
-   if (idt_present(idt-a, idt-b)) {
+   if (idt_present(idt-raw32.a, idt-raw32.b)) {
clear_bit(irq, lg-irqs_pending);
-   set_guest_interrupt(lg, idt-a, idt-b, 0);
+   set_guest_interrupt(lg, idt-raw32.a, idt-raw32.b, 0);
}
 }
 
@@ -116,7 +116,7 @@ static int has_err(unsigned int trap)
 
 int deliver_trap(struct lguest *lg, unsigned int num)
 {
-   u32 lo = lg-idt[num].a, hi = lg-idt[num].b;
+   u32 lo = lg-idt[num].raw32.a, hi = lg-idt[num].raw32.b;
 
if (!idt_present(lo, hi))
return 0;
@@ -139,7 +139,7 @@ static int direct_trap(const struct lgue
return 0;
 
/* Interrupt gates (0xE) or not present (0x0) can't go direct. */
-   return idt_type(trap-a, trap-b) == 0xF;
+   return idt_type(trap-raw32.a, trap-raw32.b) == 0xF;
 }
 
 void pin_stack_pages(struct lguest *lg)
@@ -170,15 +170,15 @@ static void set_trap(struct lguest *lg, 
u8 type = idt_type(lo, hi);
 
if (!idt_present(lo, hi)) {
-   trap-a = trap-b = 0;
+   trap-raw32.a = trap-raw32.b = 0;
return;
}
 
if (type != 0xE  type != 0xF)
kill_guest(lg, bad IDT type %i, type);
 
-   trap-a = ((__KERNEL_CS|GUEST_PL)16) | (lo0x);
-   trap-b = (hi0xEF00);
+   trap-raw32.a = ((__KERNEL_CS|GUEST_PL)16) | (lo0x);
+   trap-raw32.b = (hi0xEF00);
 }
 
 void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi)
@@ -204,8 +204,8 @@ static void default_idt_entry(struct des
if (trap == LGUEST_TRAP_ENTRY)
flags |= (GUEST_PL  13);
 
-   idt-a = (LGUEST_CS16) | (handler0x);
-   idt-b = (handler0x) | flags;
+   idt-raw32.a = (LGUEST_CS16) | (handler0x);
+   idt-raw32.b = (handler0x) | flags;
 }
 
 void setup_default_idt_entries(struct lguest_ro_state *state,
Index: linus-2.6/drivers/lguest/lg.h
===
--- linus-2.6.orig/drivers/lguest/lg.h
+++ linus-2.6/drivers/lguest/lg.h
@@ -44,8 +44,8 @@ void free_pagetables(void);
 int init_pagetables(struct page **switcher_page, unsigned int pages);
 
 /* Full 4G segment descriptors, suitable for CS and DS. */
-#define FULL_EXEC_SEGMENT ((struct desc_struct){0x, 0x00cf9b00})
-#define FULL_SEGMENT ((struct desc_struct){0x, 0x00cf9300})
+#define FULL_EXEC_SEGMENT ((struct desc_struct){ {0x00cf9b00ULL} })
+#define FULL_SEGMENT ((struct desc_struct){ {0x00cf9300ULL} })
 
 struct lguest_dma_info
 {
Index: linus-2.6/drivers/lguest/lguest.c
===
--- linus-2.6.orig/drivers/lguest/lguest.c
+++ linus-2.6/drivers/lguest/lguest.c
@@ -173,7 +173,7 @@ static void lguest_load_idt(const struct
struct desc_struct *idt = (void *)desc-address;
 
for (i = 0; i  (desc-size+1)/8; i++)
-   hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].a, idt[i].b);
+   hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].raw32.a, idt[i].raw32.b);
 }
 
 static void lguest_load_gdt(const struct Xgt_desc_struct *desc)
Index: linus-2.6/drivers/lguest/segments.c
===
--- linus-2.6.orig/drivers/lguest/segments.c
+++ linus-2.6/drivers/lguest/segments.c
@@ -3,12 +3,12 @@
 static int 

Re: [PATCH 3/3] i386: Replace struct Xgt_desc_struct with struct desc_ptr

2007-07-20 Thread Chris Wright
* Rusty Russell ([EMAIL PROTECTED]) wrote:
 Remove i386's Xgt_desc_struct definition and use desc_def.h's desc_ptr.

plus this is needed now


Index: linus-2.6/drivers/lguest/lg.h
===
--- linus-2.6.orig/drivers/lguest/lg.h
+++ linus-2.6/drivers/lguest/lg.h
@@ -91,13 +91,13 @@ struct lguest_ro_state
 {
/* Host information we need to restore when we switch back. */
u32 host_cr3;
-   struct Xgt_desc_struct host_idt_desc;
-   struct Xgt_desc_struct host_gdt_desc;
+   struct desc_ptr host_idt_desc;
+   struct desc_ptr host_gdt_desc;
u32 host_sp;
 
/* Fields which are used when guest is running. */
-   struct Xgt_desc_struct guest_idt_desc;
-   struct Xgt_desc_struct guest_gdt_desc;
+   struct desc_ptr guest_idt_desc;
+   struct desc_ptr guest_gdt_desc;
struct i386_hw_tss guest_tss;
struct desc_struct guest_idt[IDT_ENTRIES];
struct desc_struct guest_gdt[GDT_ENTRIES];
Index: linus-2.6/arch/i386/xen/enlighten.c
===
--- linus-2.6.orig/arch/i386/xen/enlighten.c
+++ linus-2.6/arch/i386/xen/enlighten.c
@@ -301,7 +301,7 @@ static void xen_set_ldt(const void *addr
xen_mc_issue(PARAVIRT_LAZY_CPU);
 }
 
-static void xen_load_gdt(const struct Xgt_desc_struct *dtr)
+static void xen_load_gdt(const struct desc_ptr *dtr)
 {
unsigned long *frames;
unsigned long va = dtr-address;
@@ -401,7 +401,7 @@ static int cvt_gate_to_trap(int vector, 
 }
 
 /* Locations of each CPU's IDT */
-static DEFINE_PER_CPU(struct Xgt_desc_struct, idt_desc);
+static DEFINE_PER_CPU(struct desc_ptr, idt_desc);
 
 /* Set an IDT entry.  If the entry is part of the current IDT, then
also update Xen. */
@@ -433,7 +433,7 @@ static void xen_write_idt_entry(struct d
preempt_enable();
 }
 
-static void xen_convert_trap_info(const struct Xgt_desc_struct *desc,
+static void xen_convert_trap_info(const struct desc_ptr *desc,
  struct trap_info *traps)
 {
unsigned in, out, count;
@@ -452,7 +452,7 @@ static void xen_convert_trap_info(const 
 
 void xen_copy_trap_info(struct trap_info *traps)
 {
-   const struct Xgt_desc_struct *desc = __get_cpu_var(idt_desc);
+   const struct desc_ptr *desc = __get_cpu_var(idt_desc);
 
xen_convert_trap_info(desc, traps);
 }
@@ -460,7 +460,7 @@ void xen_copy_trap_info(struct trap_info
 /* Load a new IDT into Xen.  In principle this can be per-CPU, so we
hold a spinlock to protect the static traps[] array (static because
it avoids allocation, and saves stack space). */
-static void xen_load_idt(const struct Xgt_desc_struct *desc)
+static void xen_load_idt(const struct desc_ptr *desc)
 {
static DEFINE_SPINLOCK(lock);
static struct trap_info traps[257];
Index: linus-2.6/drivers/lguest/lguest.c
===
--- linus-2.6.orig/drivers/lguest/lguest.c
+++ linus-2.6/drivers/lguest/lguest.c
@@ -167,7 +167,7 @@ static void lguest_write_idt_entry(struc
hcall(LHCALL_LOAD_IDT_ENTRY, entrynum, low, high);
 }
 
-static void lguest_load_idt(const struct Xgt_desc_struct *desc)
+static void lguest_load_idt(const struct desc_ptr *desc)
 {
unsigned int i;
struct desc_struct *idt = (void *)desc-address;
@@ -176,7 +176,7 @@ static void lguest_load_idt(const struct
hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].raw32.a, idt[i].raw32.b);
 }
 
-static void lguest_load_gdt(const struct Xgt_desc_struct *desc)
+static void lguest_load_gdt(const struct desc_ptr *desc)
 {
BUG_ON((desc-size+1)/8 != GDT_ENTRIES);
hcall(LHCALL_LOAD_GDT, __pa(desc-address), GDT_ENTRIES, 0);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-20 Thread Nick Piggin

(sorry if this is a resend... something bad seems to have happened to me)

Andrew Morton wrote:

On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty [EMAIL PROTECTED] wrote:



This code doesn't have all the ghastly tricks which we deploy to handle
concurrent truncate.


Do I need to ? Baaahh!!  I don't want to deal with them. 



Nick, can you think of any serious consequences of a read/truncate race in
there?  I can't..


As it doesn't allow writes, then I _think_ it should be OK. If you
ever did want to add write(2) support, then you would have transient
zeroes problems.

But I'm not completely sure.. we've had a lot of (and still have
some known and probably unknown) bugs just in that single
generic_mapping_read function, most of which are due to our rabid
aversion to doing any locking whatsoever there.

So why not just hold i_mutex around the whole thing to be safe?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Dave Jones
On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote:
  On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
   On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
 On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
  Just one of my machines to 2.6.22.1, and got this during boot..
 
  Starting udev: udevd-event[619]: udev_node_symlink: 
   symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) 
   failed: File exists
 
  Under 2.6.21, all was fine.
 
  sdc is one disk of a 3 disk raid5 set.
  The raidset still manages to come up despite this.
 
  This is a Fedora 7 box, with udev-106-4.1.fc7
 
  What changed this time?

 CONFIG_BLK_DEV_BSG=y?

 There's a name-clash, because bsg tries to create devices with the same 
   name.
 James sent a patch, it's on lkml.
  
   BSG isn't in 2.6.22
  
  Ok. There has nothing else changed, that I could think of what could cause 
  this.
  
  The code in udev that prints this message looks like:
 err(symlink(%s, %s) failed: %s, linktarget, filename, strerror(errno));
  
  That doesn't really match what you posted. Are there chars missing?

Umm. Now I'm confused. Note above that it's talking about sdc.
/dev/disk/by-uuid/ contains ..

lrwxrwxrwx 1 root root  9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 
- ../../sdd
lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD - ../../sdl1
lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB - ../../sdi1
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 
- ../../sdc1
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 
- ../../sda1
lrwxrwxrwx 1 root root  9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 
- ../../md0
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf 
- ../../sda2

note that uuid matches sdd instead.

  And what does:
udevtest /block/sdc
  print?

parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file
parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file
parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file
parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file
parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file
parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file
parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file
parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file
parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file
This program is for debugging only, it does not create any node,
or run any program specified by a RUN key. It may show incorrect results,
if rules match against subsystem specfic kernel event variables.

main: looking at device '/block/sdc' from subsystem 'block'
run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath''
run_program: '/bin/bash' (stdout) 'dm_multipath   28889  0 '
run_program: '/bin/bash' returned with status 0
run_program: '/lib/udev/usb_id -x'
run_program: '/lib/udev/usb_id' returned with status 1
run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0'
run_program: '/lib/udev/scsi_id' (stdout) 
'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi'
run_program: '/lib/udev/scsi_id' returned with status 0
udev_rules_get_name: add symlink 
'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088'
run_program: '/lib/udev/path_id /block/sdc'
run_program: '/lib/udev/path_id' (stdout) 
'ID_PATH=pci-:05:05.0-scsi-0:0:0:0'
run_program: '/lib/udev/path_id' returned with status 0
udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0'
run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0'
run_program: '/lib/udev/vol_id' (stdout) 
'ID_FS_UUID=2d773baf-8174-10a6-14db-a78e0e676e89'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL='
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL_SAFE='
run_program: '/lib/udev/vol_id' returned with status 0
udev_rules_get_name: add symlink 
'disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89'

Re: [PATCH] AFS: Fix file locking

2007-07-20 Thread Linus Torvalds


On Fri, 20 Jul 2007, Nick Piggin wrote:
 
 So you did. Then to answer that, yes it could be faster because there are
 stupid volatiles sprinkled all over the bitops code so you could easily
 end up having to do more loads. Does it make a real difference? Unlikely,
 but David loves counting cycles :)

I thought we long long since removed the volatiles. They are buggy and 
horrible, and we really want to let the compiler combine multiple 
test-bits, and if they matter that implies locking is buggy or something 
worse..

Ie we'd *want*

if (test_bit(x, y) || test_bit(z,y))

to be rewritten by the compiler as testing bits x/z at the same time.

But now I'm too scared to look.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Andrew Morton
On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH [EMAIL PROTECTED] wrote:

 --- a/kernel/params.c
 +++ b/kernel/params.c
 @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
   kobject_set_name(mk-kobj, name);
   kobject_init(mk-kobj);
   ret = kobject_add(mk-kobj);
 - BUG_ON(ret  0);
 + if (ret) {
 + printk(KERN_ERR module '%s' failed to be added to sysfs, 
 + the system will be unstable now.\n, name);
 + return;
 + }

It would be nice to print the value of `ret' too.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Greg KH
On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote:
 On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH [EMAIL PROTECTED] wrote:
 
  --- a/kernel/params.c
  +++ b/kernel/params.c
  @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
  kobject_set_name(mk-kobj, name);
  kobject_init(mk-kobj);
  ret = kobject_add(mk-kobj);
  -   BUG_ON(ret  0);
  +   if (ret) {
  +   printk(KERN_ERR module '%s' failed to be added to sysfs, 
  +   the system will be unstable now.\n, name);
  +   return;
  +   }
 
 It would be nice to print the value of `ret' too.

Ok, how about this version:

---
 kernel/params.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/kernel/params.c
+++ b/kernel/params.c
@@ -567,7 +567,12 @@ static void __init kernel_param_sysfs_se
kobject_set_name(mk-kobj, name);
kobject_init(mk-kobj);
ret = kobject_add(mk-kobj);
-   BUG_ON(ret  0);
+   if (ret) {
+   printk(KERN_ERR Module '%s' failed to be added to sysfs, 
+ error number %d\n, name, ret);
+   printk(KERN_ERR The system will be unstable now.\n);
+   return;
+   }
param_sysfs_setup(mk, kparam, num_params, name_skip);
kobject_uevent(mk-kobj, KOBJ_ADD);
 }
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Kay Sievers

On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:

On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote:
  On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
   On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
 On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
  Just one of my machines to 2.6.22.1, and got this during boot..
 
  Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists
 
  Under 2.6.21, all was fine.
 
  sdc is one disk of a 3 disk raid5 set.
  The raidset still manages to come up despite this.
 
  This is a Fedora 7 box, with udev-106-4.1.fc7
 
  What changed this time?

 CONFIG_BLK_DEV_BSG=y?

 There's a name-clash, because bsg tries to create devices with the same 
name.
 James sent a patch, it's on lkml.
  
   BSG isn't in 2.6.22
 
  Ok. There has nothing else changed, that I could think of what could cause 
this.
 
  The code in udev that prints this message looks like:
 err(symlink(%s, %s) failed: %s, linktarget, filename, strerror(errno));
 
  That doesn't really match what you posted. Are there chars missing?

Umm. Now I'm confused. Note above that it's talking about sdc.
/dev/disk/by-uuid/ contains ..

lrwxrwxrwx 1 root root  9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 
- ../../sdd
lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD - ../../sdl1
lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB - ../../sdi1
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 
- ../../sdc1
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 
- ../../sda1
lrwxrwxrwx 1 root root  9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 
- ../../md0
lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf 
- ../../sda2

note that uuid matches sdd instead.

  And what does:
udevtest /block/sdc
  print?

parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file
parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file
parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file
parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file
parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file
parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file
parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file
parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file
parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file
parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file
This program is for debugging only, it does not create any node,
or run any program specified by a RUN key. It may show incorrect results,
if rules match against subsystem specfic kernel event variables.

main: looking at device '/block/sdc' from subsystem 'block'
run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath''
run_program: '/bin/bash' (stdout) 'dm_multipath   28889  0 '
run_program: '/bin/bash' returned with status 0
run_program: '/lib/udev/usb_id -x'
run_program: '/lib/udev/usb_id' returned with status 1
run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0'
run_program: '/lib/udev/scsi_id' (stdout) 
'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk'
run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi'
run_program: '/lib/udev/scsi_id' returned with status 0
udev_rules_get_name: add symlink 
'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088'
run_program: '/lib/udev/path_id /block/sdc'
run_program: '/lib/udev/path_id' (stdout) 
'ID_PATH=pci-:05:05.0-scsi-0:0:0:0'
run_program: '/lib/udev/path_id' returned with status 0
udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0'
run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0'
run_program: '/lib/udev/vol_id' (stdout) 
'ID_FS_UUID=2d773baf-8174-10a6-14db-a78e0e676e89'
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL='
run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL_SAFE='
run_program: '/lib/udev/vol_id' returned with status 0
udev_rules_get_name: add symlink 

Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Steven Rostedt
On Sat, 21 Jul 2007, Arnd Bergmann wrote:

 On Saturday 21 July 2007, Thomas Gleixner wrote:


 In my experience, it's very helpful to have a single set of header
 files, and merging the two versions of one header usually exposes
 bugs that have been fixed in only one of the two, so you get
 to fix actual bugs in the process.

This can still be done after the merge tglx did.


 In the s390 merge, I also started out in an attempt to guarantee
 unchanged object files, much like what you describe. However, it
 turned out that fixing it in the process is actually easier.
 Either way, 'diff -D __x86_64__' is a great tool for a start, you
 should try it out to see how easy it is to merge a lot of files.

 To put it into perspective, I think the s390 merge was a lot easier
 than the x86 merge, because there is only a very limited set of
 hardware configurations for s390 compared to others. We ended up
 doing the full merge with three people within less than a week
 and no separate files at all.

This is the big reason they wanted to keep it binary identical. Since
there are just way too many different configs out there in the x86
world


 OTOH, the powerpc merge is now going into its third year, mostly
 because it was started with the intention to remove all cruft
 in the process and to only allow sane code into the new architecture.

I'd expect x86 to move much faster, just because there are more developers
and users of x86 PCs than there are for powerpc.


 The steps that I'd suggest instead are:

 * merge all exported header files of the two architectures. This
   alone is a worthy goal, because it allows us to get rid of
   the ugly code for deciding which version to use in installed
   headers and elsewhere.

I don't see why this can't be done after the first Big merge.


 * Create an arch/x86/Makefile that descends into ../i386/* and
   ../x86_64/* instead of its subdirectories.

The thing that Thomas pointed out, is that physical location of the source
actually does matter. Having two files side by side with the same name
except for a _32.c and _64.c, makes a developer want to merge them.

A perfect example is looking at both
  arch/x86/kernel/module_{32,64}.c
One would be encouraged to make that into a single file. But having
a arch/i386/kernel/module.c and a arch/x86_64/kernel/module.c would
take some time before anyone would care.


 * Merge the arch/x86/* subdirectories, one at a time, starting with
   the low-hanging fruit like oprofile or pci, and do the hard
   ones like mm and kernel last.

Your looking at a 10year plus merge with that approach. I think that is
exactly what Ingo and Thomas _dont_ want.  Doing it as the big bang way as
is posted in this patch is the fastest way to get where we want to go.


 Unfortunately, I don't think I'll spend much time on this, so I
 don't get to decide on it, but you asked for feedback ;-)


I'm actually looking forward to helping out here ;-)

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: film at 11: kernel update breaks udev.

2007-07-20 Thread Kay Sievers

On 7/21/07, Kay Sievers [EMAIL PROTECTED] wrote:

On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
 On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote:
   On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote:
  On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote:
   Just one of my machines to 2.6.22.1, and got this during boot..
  
   Starting udev: udevd-event[619]: udev_node_symlink: 
symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists
  
   Under 2.6.21, all was fine.
  
   sdc is one disk of a 3 disk raid5 set.
   The raidset still manages to come up despite this.
  
   This is a Fedora 7 box, with udev-106-4.1.fc7
  
   What changed this time?
 
  CONFIG_BLK_DEV_BSG=y?
 
  There's a name-clash, because bsg tries to create devices with the 
same name.
  James sent a patch, it's on lkml.
   
BSG isn't in 2.6.22
  
   Ok. There has nothing else changed, that I could think of what could cause 
this.
  
   The code in udev that prints this message looks like:
  err(symlink(%s, %s) failed: %s, linktarget, filename, 
strerror(errno));
  
   That doesn't really match what you posted. Are there chars missing?

 Umm. Now I'm confused. Note above that it's talking about sdc.
 /dev/disk/by-uuid/ contains ..

 lrwxrwxrwx 1 root root  9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 
- ../../sdd
 lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD - ../../sdl1
 lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB - ../../sdi1
 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 
- ../../sdc1
 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 
- ../../sda1
 lrwxrwxrwx 1 root root  9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 
- ../../md0
 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf 
- ../../sda2

 note that uuid matches sdd instead.

   And what does:
 udevtest /block/sdc
   print?

 parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file
 parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file
 This program is for debugging only, it does not create any node,
 or run any program specified by a RUN key. It may show incorrect results,
 if rules match against subsystem specfic kernel event variables.

 main: looking at device '/block/sdc' from subsystem 'block'
 run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath''
 run_program: '/bin/bash' (stdout) 'dm_multipath   28889  0 '
 run_program: '/bin/bash' returned with status 0
 run_program: '/lib/udev/usb_id -x'
 run_program: '/lib/udev/usb_id' returned with status 1
 run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32'
 run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA'
 run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M'
 run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0'
 run_program: '/lib/udev/scsi_id' (stdout) 
'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088'
 run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088'
 run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk'
 run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi'
 run_program: '/lib/udev/scsi_id' returned with status 0
 udev_rules_get_name: add symlink 
'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088'
 run_program: '/lib/udev/path_id /block/sdc'
 run_program: '/lib/udev/path_id' (stdout) 
'ID_PATH=pci-:05:05.0-scsi-0:0:0:0'
 run_program: '/lib/udev/path_id' returned with status 0
 udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0'
 run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32'
 run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid'
 run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member'
 run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0'
 run_program: '/lib/udev/vol_id' (stdout) 
'ID_FS_UUID=2d773baf-8174-10a6-14db-a78e0e676e89'
 run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL='
 run_program: '/lib/udev/vol_id' (stdout) 

Re: net/ipv4/inetpeer.c stack warnings

2007-07-20 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Thu, 19 Jul 2007 14:48:59 +0200

 Gabriel C wrote:
  Hello ,
 
  I noticed on current git this warning in net/ipv4/inetpeer.c
 
 Yeah, I have no idea why the gcc people thought that this was
 something worth warning about. Especially since explicitly
 checking for != NULL silences the warning again.

Sigh, applied :-)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Glauber de Oliveira Costa

On 7/20/07, Steven Rostedt [EMAIL PROTECTED] wrote:


 I really like the idea of a unified source tree for the 2 x86 variants.
 The technical differences are really small (of course there are
 differences, especially in the boot sequence), and striving to unify as
 much as possible while having a clean way to do per 32/64 bit parts as
 well is something that imo is the right thing.


Not to mention all the paravirt stuff that's going on. Having a single
x86 arch to work with would be greatly beneficial to the work being done
to port paravirt to x86_64.


As for paravirt, it'd really help. As I had the tree lagged behind by
so much, a great part of the work now is checking where i386 is,
seeing if it applies for 64-bit, and so on. The differences are not so
huge, and I'm trying my best to not let them deviate too much. It
could mostly be built incrementally.

And I bet a huge part of the tree could be like this too: In most
places, they are different for no particular reason, just because two
people implemented it separately. There'd be a huge effort to bring
those differences into an end, but I think I'd pay in future
development speed. (not to mention the duplicate bugs linus have
already talked about)


Way to go, Thomas and Ingo!

I am pretty much for it too.


--
Glauber de Oliveira Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Dell Inspiron 1501 fails to boot in 2.6.21+

2007-07-20 Thread Glauber de Oliveira Costa

On 7/20/07, Mark Tiefenbruck [EMAIL PROTECTED] wrote:

I'd appreciate any help on getting this report sent to the appropriate
list and, of course, getting this fixed. I don't know what's useful,
so you're getting everything. This will be a very long e-mail.

My new laptop won't boot with kernel versions 2.6.21 or 2.6.22 . No
oops. No panic. It just stops printing messages. Maybe it would
eventually continue if I wait long enough, but it's unacceptable
either way. I include below the contents of dmesg for a working kernel
up to the point where it halts. I'm also including what it usually
does for a few lines after that point.

I did git-bisect on the 2.6.21.y tree. I'm including the result of
that as well. It mentions HPET, so I should mention my computer also
fails to boot when I enable HPET in my BIOS. I don't have the details
of this currently; I can reproduce it again if needed.

I've also included my kernel configuration and ver_linux output.
You'll notice that my gcc version is 4.2.0, but this also happens with
4.1.2. I'm including /proc/cpuinfo and lspci -vvv. I'm including
/proc/ioports and /proc/iomem. I don't have a /proc/scsi.

  Thanks,
 Mark


Here's the commit that causes the problem:



e9e2cdb412412326c4827fc78ba27f410d837e6e is first bad commit
commit e9e2cdb412412326c4827fc78ba27f410d837e6e
Author: Thomas Gleixner [EMAIL PROTECTED]
Date:   Fri Feb 16 01:28:04 2007 -0800

[PATCH] clockevents: i386 drivers

Add clockevent drivers for i386: lapic (local) and PIT/HPET
(global).  Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver.  The
assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()

Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.

No changes to existing functionality.

[ kdump fix from Vivek Goyal [EMAIL PROTECTED] ]
[ fixes based on review feedback from Arjan van de Ven
[EMAIL PROTECTED] ]
Cleanups-from: Adrian Bunk [EMAIL PROTECTED]
Build-fixes-from: Andrew Morton [EMAIL PROTECTED]
Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]
Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
Cc: john stultz [EMAIL PROTECTED]
Cc: Roman Zippel [EMAIL PROTECTED]
Cc: Andi Kleen [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
Signed-off-by: Linus Torvalds [EMAIL PROTECTED]


As a wild guess, I'd bet that the rcu queues are failing to get called
(probably some problem with the timer interrupt in the APs?), thus
preventing the system to get into a quiescent state.

It does seem timer related to me. Maybe one of the timer gurus have
any other word on this?

--
Glauber de Oliveira Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Yinghai Lu

On 7/20/07, Ingo Molnar [EMAIL PROTECTED] wrote:


* Jeff Garzik [EMAIL PROTECTED] wrote:

 I agree with Andi...  it's quite nice to be able to leave some
 arch/i386 stuff, and not carry it over to arch/x86-64.

we can leave those few items in arch/x86 just as much. No need to keep
around a legacy tree for that.


how about making all files ans directories take _32 or _64 in the name?
except the files or dir that are shared.

for example: k8_bus.c is only need by 64 === change it to k8_bus_64.c
mach-generic=== mach-generic_32

YH
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] [V2] Define is_global_init() and is_container_init()

2007-07-20 Thread sukadev
Andrew Morton [EMAIL PROTECTED] wrote:
| On Thu, 19 Jul 2007 00:21:58 -0700
| [EMAIL PROTECTED] wrote:
| 
|  --- lx26-22-rc6-mm1a.orig/kernel/pid.c  2007-07-16 12:55:15.0 
-0700
|  +++ lx26-22-rc6-mm1a/kernel/pid.c   2007-07-16 13:10:48.0 -0700
|  @@ -69,6 +69,13 @@ struct pid_namespace init_pid_ns = {
|  .last_pid = 0,
|  .child_reaper = init_task
|   };
|  +EXPORT_SYMBOL(init_pid_ns);
|  +
|  +int is_global_init(struct task_struct *tsk)
|  +{
|  +   return tsk == init_pid_ns.child_reaper;
|  +}
|  +EXPORT_SYMBOL(is_global_init);
| 
| I don't immediately see why init_pid_ns was exported to modules.

| 
| It would need to be exported if is_global_init() was made static inline in a
| header (which seems like a sensible thing to do), but it wasn't.

It did not need to be exported in this patch.

I have a couple of follow-on patches that cleaned up some header-file
dependencies and made is_global_init() inline. Those patches are
changing a bit as I merge them with Pavel Emelianov's pid ns changes.

I will send a separate patch to inline is_global_init().

Suka
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Satyam Sharma

[ Considering this has sufficiently excited me, I became the second person
to illegitimately download 2.6.22-mm1 and am presently building Michal's
config. The strange thing is that I couldn't get 22-mm1 to even build with
the posted .config -- so had to deselect XFS, ATA, unionfs.

Hopefully this bug should be 100% reproducible at boot time anyway.
Don't care much for XFS and unionfs, but hoping deselecting ATA from
the config doesn't change the variables much in this equation. ]


On 7/21/07, Greg KH [EMAIL PROTECTED] wrote:

On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote:
 On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH [EMAIL PROTECTED] wrote:

  --- a/kernel/params.c
  +++ b/kernel/params.c
  @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
  kobject_set_name(mk-kobj, name);
  kobject_init(mk-kobj);
  ret = kobject_add(mk-kobj);
  -   BUG_ON(ret  0);
  +   if (ret) {
  +   printk(KERN_ERR module '%s' failed to be added to sysfs, 
  +   the system will be unstable now.\n, name);
  +   return;
  +   }

 It would be nice to print the value of `ret' too.



What I'm surprised about is that %eax doesn't seem to contain the
return value `ret' of kobject_add(). It's 1, which is funny, given:

ret = kobject_add(mk-kobj);
BUG_ON(ret  0);

One wouldn't expect BUG() -- or the corresponding exception handler --
to clobber registers, that would be a sad day.



Ok, how about this version:

---
 kernel/params.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/kernel/params.c
+++ b/kernel/params.c
@@ -567,7 +567,12 @@ static void __init kernel_param_sysfs_se
kobject_set_name(mk-kobj, name);
kobject_init(mk-kobj);
ret = kobject_add(mk-kobj);
-   BUG_ON(ret  0);
+   if (ret) {
+   printk(KERN_ERR Module '%s' failed to be added to sysfs, 
+ error number %d\n, name, ret);
+   printk(KERN_ERR The system will be unstable now.\n);
+   return;
+   }
param_sysfs_setup(mk, kparam, num_params, name_skip);
kobject_uevent(mk-kobj, KOBJ_ADD);
 }



I'm building with this:

if (ret) {
   printk(~ .%s.%d.%s. ~\n, name, ret, kparam-name);
   return;
}

To also print out the evil kparam-name that caused us to crash.
When ret == EINVAL, name would be , so not so helpful alone.

Also enabling netconsole, though I'm sure there's zero chances
of NET / ethXXX / netconsole being up _this_ early in the boot ...

Will keep you guys posted :-)

Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] infiniband mlx4: potential leaks in __mlx4_ib_modify_qp

2007-07-20 Thread Roland Dreier
thanks, applied.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Use descriptor's functions instead of inline assembly

2007-07-20 Thread Chris Wright
* Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote:
 This patch provides a new set of functions for managing the descriptor
 tables that can be used instead of putting the raw assembly in .c files.

Looks alright, some cleanups below

 Remodeling of store_tr() suggested by Frederik Deweerdt.
 
 Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
 
 diff --git a/arch/x86_64/kernel/head64.c b/arch/x86_64/kernel/head64.c
 index 6c34bdd..dde41d7 100644
 --- a/arch/x86_64/kernel/head64.c
 +++ b/arch/x86_64/kernel/head64.c
 @@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * real_mode_data)
  
   for (i = 0; i  IDT_ENTRIES; i++)
   set_intr_gate(i, early_idt_handler);
 - asm volatile(lidt %0 :: m (idt_descr));
 + load_idt((const struct desc_ptr *)idt_descr);

No need for extra casting

   early_printk(Kernel alive\n);
  
 diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
 index 7503068..7c50a12 100644
 --- a/arch/x86_64/kernel/reboot.c
 +++ b/arch/x86_64/kernel/reboot.c
 @@ -11,6 +11,7 @@
  #include linux/sched.h
  #include asm/io.h
  #include asm/delay.h
 +#include asm/desc.h
  #include asm/hw_irq.h
  #include asm/system.h
  #include asm/pgtable.h
 @@ -132,7 +133,7 @@ void machine_emergency_restart(void)
   }
  
   case BOOT_TRIPLE: 
 - __asm__ __volatile__(lidt (%0): :r (no_idt));
 + load_idt((const struct desc_ptr *)no_idt);

same here, plus opportunity for cleanup

   __asm__ __volatile__(int3);
  
   reboot_type = BOOT_KBD;
 diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
 index 1200aaa..fef7290 100644
 --- a/arch/x86_64/kernel/setup64.c
 +++ b/arch/x86_64/kernel/setup64.c
 @@ -224,8 +224,8 @@ void __cpuinit cpu_init (void)
   memcpy(cpu_gdt(cpu), cpu_gdt_table, GDT_SIZE);
  
   cpu_gdt_descr[cpu].size = GDT_SIZE;
 - asm volatile(lgdt %0 :: m (cpu_gdt_descr[cpu]));
 - asm volatile(lidt %0 :: m (idt_descr));
 + load_gdt((const struct desc_ptr *)cpu_gdt_descr[cpu]);
 + load_idt((const struct desc_ptr *)idt_descr);

same here

   memset(me-thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
   syscall_init();
 diff --git a/arch/x86_64/kernel/suspend.c b/arch/x86_64/kernel/suspend.c
 index b39d478..ddedadf 100644
 --- a/arch/x86_64/kernel/suspend.c
 +++ b/arch/x86_64/kernel/suspend.c
 @@ -32,9 +32,9 @@ void __save_processor_state(struct saved_context *ctxt)
   /*
* descriptor tables
*/
 - asm volatile (sgdt %0 : =m (ctxt-gdt_limit));
 - asm volatile (sidt %0 : =m (ctxt-idt_limit));
 - asm volatile (str %0  : =m (ctxt-tr));
 + store_gdt((struct desc_ptr *)ctxt-gdt_limit);
 + store_idt((struct desc_ptr *)ctxt-idt_limit);

same here, opportunity for cleanup

 + store_tr(ctxt-tr);
  
   /* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
   /*
 @@ -91,8 +91,9 @@ void __restore_processor_state(struct saved_context *ctxt)
* now restore the descriptor tables to their proper values
* ltr is done i fix_processor_context().
*/
 - asm volatile (lgdt %0 :: m (ctxt-gdt_limit));
 - asm volatile (lidt %0 :: m (ctxt-idt_limit));
 + load_gdt((const struct desc_ptr *)ctxt-gdt_limit);
 + load_idt((const struct desc_ptr *)ctxt-idt_limit);
 + 
  
   /*
* segment registers
 diff --git a/include/asm-x86_64/desc.h b/include/asm-x86_64/desc.h
 index ac991b5..f2b0a6f 100644
 --- a/include/asm-x86_64/desc.h
 +++ b/include/asm-x86_64/desc.h
 @@ -20,6 +20,15 @@ extern struct desc_struct cpu_gdt_table[GDT_ENTRIES];
  #define load_LDT_desc() asm volatile(lldt %w0::r (GDT_ENTRY_LDT*8))
  #define clear_LDT()  asm volatile(lldt %w0::r (0))
  
 +static inline unsigned long __store_tr(void)
 +{
 +   unsigned long tr;
 +   asm volatile (str %w0:=r (tr));
 +   return tr;
 +}

native_store_tr (although I've no objection to just fixing the interface)


Index: linus-2.6/arch/x86_64/kernel/head64.c
===
--- linus-2.6.orig/arch/x86_64/kernel/head64.c
+++ linus-2.6/arch/x86_64/kernel/head64.c
@@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * r
 
for (i = 0; i  IDT_ENTRIES; i++)
set_intr_gate(i, early_idt_handler);
-   load_idt((const struct desc_ptr *)idt_descr);
+   load_idt(idt_descr);
 
early_printk(Kernel alive\n);
 
Index: linus-2.6/arch/x86_64/kernel/reboot.c
===
--- linus-2.6.orig/arch/x86_64/kernel/reboot.c
+++ linus-2.6/arch/x86_64/kernel/reboot.c
@@ -24,7 +24,7 @@
 void (*pm_power_off)(void);
 EXPORT_SYMBOL(pm_power_off);
 
-static long no_idt[3];
+static struct desc_ptr no_idt;
 static enum { 
BOOT_TRIPLE = 't',
BOOT_KBD = 'k'
@@ -133,7 +133,7 @@ void machine_emergency_restart(void)
}
 

Re: posible latency issues in seq_read

2007-07-20 Thread Eric Dumazet

Chris Friesen a écrit :

Lee Revell wrote:

On 7/20/07, Chris Friesen [EMAIL PROTECTED] wrote:



We've run into an issue (on 2.6.10) where calling lsof triggers lost
packets on our server.  Preempt is disabled, and NAPI is enabled.



Can you reproduce with a recent kernel?  Lots of latency issues have
been fixed since then.


Unfortunately I have to fix it on this version (the bug was found on 
shipped product), so if there was a difference I'd have to isolate the 
changes and backport them.  Also, I can't run the software that triggers 
the problem on a newer kernel as it has dependencies on various patches 
that are not in mainline.


Basically what I'd like to know is whether calling schedule() in 
seq_read() is safe or whether it would break assumptions made by 
seq_file users.




It wont help much. seq_read() is fine in itself.

The problem is in established_get_next() and established_get_first() not 
allowing softirq processing, while scanning a possibly huge hash table, even 
if few sockets are hashed in.


As cond_resched_softirq() was added in linux-2.6.11, you probably *need* to 
check the diffs between linux-2.6.10  linux-2.6.11


files :

include/linux/sched.h
net/core/sock.c  (__release_sock() latency)
net/ipv4/tcp_ipv4.c  (/proc/net/tcp latency)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Documentation for sysfs, hotplug, and firmware loading.

2007-07-20 Thread Rob Landley
On Friday 20 July 2007 4:09:36 am Greg KH wrote:
 On Fri, Jul 20, 2007 at 09:54:01AM +0200, Cornelia Huck wrote:
  On Fri, 20 Jul 2007 00:00:01 -0700,
 
  Greg KH [EMAIL PROTECTED] wrote:
I don't insist on it, mknod insists on it.  You cannot mknod a dev
node without specifying block or char.
   
You're saying that sysfs should provide major and minor numbers
without anywhere specifying char or block, meaning the major and
minor numbers cannot be _used_.  I am insisting on getting the third
piece of information without which major and minor are useless.
   
I asked very specifically about this at OLS, several times.  What
you're telling me now seems to contradict what you told me then.
  
   Here's the rule:
 If the SUBSYSTEM is block, it's a block device.  Otherwise
 it's a char device.
 
  That's actually quite confusing to the casual reader, since:
   But also realize that the majority of events you will get have nothing
   to do with device nodes.  I think you are forgetting this fact.
 
  So the rule should be:
  If the SUBSYSTEM is block (implying major/minor are provided),
  it's a block device.
  If the SUBSYSTEM is not block, and major/minor are provided,
  it's a char device.
  If major/minor are not provided, the event/device is not
  relevant to device node creation.

 Yes, that is much more descriptive, thanks.

agreed, thanks.

I'll try to post an updated version of my hotplug documentation later tonight.  
(Just a _touch_ jetlagged at the moment, though.  It may only be 9:47 
california time, but it's 11:47 on the east cost.  I think.)

 greg k-h

Rob
-- 
One of my most productive days was throwing away 1000 lines of code.
  - Ken Thompson.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570

2007-07-20 Thread Satyam Sharma

On 7/21/07, Satyam Sharma [EMAIL PROTECTED] wrote:

Hopefully this bug should be 100% reproducible at boot time anyway.
Don't care much for XFS and unionfs, but hoping deselecting ATA from
the config doesn't change the variables much in this equation. ]



Gargh! My system obviously cannot boot without libata. Guess it's
time to go through git log and see how to fix that build breakage
myself ...

Michal, how did you even manage to build / boot this kernel!



On 7/21/07, Greg KH [EMAIL PROTECTED] wrote:
 On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote:
  On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH [EMAIL PROTECTED] wrote:
 
   --- a/kernel/params.c
   +++ b/kernel/params.c
   @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se
   kobject_set_name(mk-kobj, name);
   kobject_init(mk-kobj);
   ret = kobject_add(mk-kobj);
   -   BUG_ON(ret  0);
   +   if (ret) {
   +   printk(KERN_ERR module '%s' failed to be added to sysfs, 
   +   the system will be unstable now.\n, name);
   +   return;
   +   }
 
  It would be nice to print the value of `ret' too.


What I'm surprised about is that %eax doesn't seem to contain the
return value `ret' of kobject_add(). It's 1, which is funny, given:

ret = kobject_add(mk-kobj);
BUG_ON(ret  0);

One wouldn't expect BUG() -- or the corresponding exception handler --
to clobber registers, that would be a sad day.



But I cracked this one alright. His .config has CONFIG_PROFILE_LIKELY=y
which replaces unlikely() / likely() with do_check_likely() and forces
gcc to clobber %eax with the condition itself, which in our case was
(ret  0) == TRUE, and thus, the 1 value we saw in %eax in the
register dumps.

We should probably document somewhere that CONFIG_PROFILE_LIKELY
is not good for debugging.

Hmmm ... thinking out aloud here, but probably I don't need to fix that
libata breakage at all. I'll just put the BUG_ON(ret  0) back in the
code, deselect PROFILE_LIKELY, and this time we _will_ have the
return of kobject_add() in %eax ...


That'll at least clear up the EEXIST vs EINVAL mystery, that'll be a
good data point, yes.

Anyway, I guess I must stop my running commentary -- will only post
after this is cleared up now :-)

Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Kconfig: Remove top level menu Code maturity level options

2007-07-20 Thread Al Boldi

This patch removes the top level menu Code maturity level options, and 
moves its options into menu General setup.

This makes Kconfig less cluttered and easier to setup.


Cc: Andrew Morton [EMAIL PROTECTED]
Signed-off-by: Al Boldi [EMAIL PROTECTED]

---
--- a/init/Kconfig  2007-07-09 06:38:47.0 +0300
+++ b/init/Kconfig  2007-07-21 06:42:06.0 +0300
@@ -7,7 +7,7 @@ config DEFCONFIG_LIST
default /boot/config-$UNAME_RELEASE
default arch/$ARCH/defconfig
 
-menu Code maturity level options
+menu General setup
 
 config EXPERIMENTAL
bool Prompt for development and/or incomplete code/drivers
@@ -61,9 +61,6 @@ config INIT_ENV_ARG_LIMIT
  Maximum of each of the number of arguments and environment
  variables passed to init from the kernel command line.
 
-endmenu
-
-menu General setup
 
 config LOCALVERSION
string Local version - append to kernel release

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/8] readahead cleanups and interleaved readahead take 2

2007-07-20 Thread Fengguang Wu
Linus,

To save you from some merge conflicts, I rebased this readahead patchset
to 2.6.22-git5.

The following patches are based on yesterday's discussions, compiled and
tested OK.

smaller file_ra_state:
[PATCH 1/8] compacting file_ra_state
  
[PATCH 2/8] mmap read-around simplification 
  
[PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos  
  

code cleanups:
[PATCH 4/8] trivial filemap.c cleanups  
  
[PATCH 5/8] remove several readahead macros 
  
[PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb 
  

support of interleaved reads:
[PATCH 7/8] introduce radix_tree_scan_hole()
  
[PATCH 8/8] basic support of interleaved reads  
  


The diffstat is

 block/ll_rw_blk.c  |9 -
 fs/ext3/dir.c  |2 -
 fs/ext4/dir.c  |2 -
 fs/splice.c|2 -
 include/linux/fs.h |   14 +++-
 include/linux/mm.h |2 -
 include/linux/radix-tree.h |2 +
 lib/radix-tree.c   |   34 
 mm/filemap.c   |   31 +-
 mm/readahead.c |   58 +++
 10 files changed, 92 insertions(+), 64 deletions(-)

Regards,
Fengguang Wu
---
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] basic support of interleaved reads

2007-07-20 Thread Fengguang Wu
This is a simplified version of the pagecache context based readahead.
It handles the case of multiple threads reading on the same fd and invalidating
each others' readahead state. It does the trick by scanning the pagecache and
recovering the current read stream's readahead status.

The algorithm works in a opportunistic way, in that it do not try to detect
interleaved reads _actively_, which requires a probe into the page cache(which
means a little more overheads for random reads). It only tries to handle a
previously started sequential readahead whose state was overwritten by
another concurrent stream, and it can do this job pretty well.

Negative and positive examples(or what you can expect from it):

1) it cannot detect and serve perfect request-by-request interleaved reads
   right:
timestream 1  stream 2
0   1 
1 1001
2   2
3 1002
4   3
5 1003
6   4
7 1004
8   5
9 1005
Here no single readahead will be carried out.

2) However, if it's two concurrent reads by two threads, the chance of the
   initial sequential readahead be started is huge. Once the first sequential
   readahead is started for a stream, this patch will ensure that the readahead
   window continues to rampup and won't be disturbed by other streams.

timestream 1  stream 2
0   1 
1   2
2 1001
3   3
4 1002
5 1003
6   4
7   5
8 1004
9   6
101005
11  7
121006
131007
Here steam 1 will start a readahead at page 2, and stream 2 will start its
first readahead at page 1003. From then on the two streams will be served right.

Cc: Nick Piggin [EMAIL PROTECTED]
Cc: Rusty Russell [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 mm/readahead.c |   33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -371,6 +371,29 @@ ondemand_readahead(struct address_space 
}
 
/*
+* Hit a marked page without valid readahead state.
+* E.g. interleaved reads.
+* Query the pagecache for async_size, which normally equals to
+* readahead size. Ramp it up and use it as the new readahead size.
+*/
+   if (hit_readahead_marker) {
+   pgoff_t start;
+
+   read_lock_irq(mapping-tree_lock);
+   start = radix_tree_scan_hole(mapping-page_tree, offset, 
max+1);
+   read_unlock_irq(mapping-tree_lock);
+
+   if (!start || start - offset  max)
+   return 0;
+
+   ra-start = start;
+   ra-size = start - offset;  /* old async_size */
+   ra-size = get_next_ra_size(ra, max);
+   ra-async_size = ra-size;
+   goto readit;
+   }
+
+   /*
 * It may be one of
 *  - first read on start of file
 *  - sequential cache miss
@@ -381,16 +404,6 @@ ondemand_readahead(struct address_space 
ra-size = get_init_ra_size(req_size, max);
ra-async_size = ra-size  req_size ? ra-size - req_size : ra-size;
 
-   /*
-* Hit on a marked page without valid readahead state.
-* E.g. interleaved reads.
-* Not knowing its readahead pos/size, bet on the minimal possible one.
-*/
-   if (hit_readahead_marker) {
-   ra-start++;
-   ra-size = get_next_ra_size(ra, max);
-   }
-
 readit:
return ra_submit(ra, mapping, filp);
 }

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Fengguang Wu
Use 'unsigned int' instead of 'unsigned long' for readahead sizes.

This helps reduce memory consumption on 64bit CPU when
a lot of files are opened.

CC: Andi Kleen [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 include/linux/fs.h |8 
 mm/readahead.c |2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -697,12 +697,12 @@ struct fown_struct {
  * Track a single file's readahead state
  */
 struct file_ra_state {
-   pgoff_t start;  /* where readahead started */
-   unsigned long size; /* # of readahead pages */
-   unsigned long async_size;   /* do asynchronous readahead when
+   pgoff_t start;  /* where readahead started */
+   unsigned int size;  /* # of readahead pages */
+   unsigned int async_size;/* do asynchronous readahead when
   there are only # of pages ahead */
 
-   unsigned long ra_pages; /* Maximum readahead window */
+   unsigned int ra_pages;  /* Maximum readahead window */
unsigned long mmap_hit; /* Cache hit stat for mmap accesses */
unsigned long mmap_miss;/* Cache miss stat for mmap accesses */
unsigned long prev_index;   /* Cache last read() position */
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -350,7 +350,7 @@ ondemand_readahead(struct address_space 
   bool hit_readahead_marker, pgoff_t offset,
   unsigned long req_size)
 {
-   unsigned long max;  /* max readahead pages */
+   int max;/* max readahead pages */
int sequential;
 
max = ra-ra_pages;

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] remove several readahead macros

2007-07-20 Thread Fengguang Wu
Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES.

Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 include/linux/mm.h |2 --
 mm/readahead.c |   10 +-
 2 files changed, 1 insertion(+), 11 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/mm.h
+++ linux-2.6.22-git15/include/linux/mm.h
@@ -1136,8 +1136,6 @@ int write_one_page(struct page *page, in
 /* readahead.c */
 #define VM_MAX_READAHEAD   128 /* kbytes */
 #define VM_MIN_READAHEAD   16  /* kbytes (includes current page) */
-#define VM_MAX_CACHE_HIT   256 /* max pages in a row in cache before
-* turning readahead off */
 
 int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
pgoff_t offset, unsigned long nr_to_read);
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -21,16 +21,8 @@ void default_unplug_io_fn(struct backing
 }
 EXPORT_SYMBOL(default_unplug_io_fn);
 
-/*
- * Convienent macros for min/max read-ahead pages.
- * Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up.
- * The latter is necessary for systems with large page size(i.e. 64k).
- */
-#define MAX_RA_PAGES   (VM_MAX_READAHEAD*1024 / PAGE_CACHE_SIZE)
-#define MIN_RA_PAGES   DIV_ROUND_UP(VM_MIN_READAHEAD*1024, PAGE_CACHE_SIZE)
-
 struct backing_dev_info default_backing_dev_info = {
-   .ra_pages   = MAX_RA_PAGES,
+   .ra_pages   = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
.state  = 0,
.capabilities   = BDI_CAP_MAP_COPY,
.unplug_io_fn   = default_unplug_io_fn,

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb

2007-07-20 Thread Fengguang Wu
Remove the size limit max_sectors_kb imposed on max_readahead_kb.

The size restriction is unreasonable. Especially when max_sectors_kb cannot
grow larger than max_hw_sectors_kb, which can be rather small for some disk
drives.

Cc: Jens Axboe [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
Acked-by: Jens Axboe [EMAIL PROTECTED]
---
 block/ll_rw_blk.c |9 -
 1 file changed, 9 deletions(-)

--- linux-2.6.22-git15.orig/block/ll_rw_blk.c
+++ linux-2.6.22-git15/block/ll_rw_blk.c
@@ -3946,7 +3946,6 @@ queue_max_sectors_store(struct request_q
max_hw_sectors_kb = q-max_hw_sectors  1,
page_kb = 1  (PAGE_CACHE_SHIFT - 10);
ssize_t ret = queue_var_store(max_sectors_kb, page, count);
-   int ra_kb;
 
if (max_sectors_kb  max_hw_sectors_kb || max_sectors_kb  page_kb)
return -EINVAL;
@@ -3955,14 +3954,6 @@ queue_max_sectors_store(struct request_q
 * values synchronously:
 */
spin_lock_irq(q-queue_lock);
-   /*
-* Trim readahead window as well, if necessary:
-*/
-   ra_kb = q-backing_dev_info.ra_pages  (PAGE_CACHE_SHIFT - 10);
-   if (ra_kb  max_sectors_kb)
-   q-backing_dev_info.ra_pages =
-   max_sectors_kb  (PAGE_CACHE_SHIFT - 10);
-
q-max_sectors = max_sectors_kb  1;
spin_unlock_irq(q-queue_lock);
 

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] introduce radix_tree_scan_hole()

2007-07-20 Thread Fengguang Wu
Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree
for the first hole. It will be used in interleaved readahead.

The implementation is dumb and obviously correct.
It can help debug(and document) the possible smart one in future.

Cc: Nick Piggin [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---

 include/linux/radix-tree.h |2 ++
 lib/radix-tree.c   |   34 ++
 2 files changed, 36 insertions(+)

--- linux-2.6.22-git15.orig/include/linux/radix-tree.h
+++ linux-2.6.22-git15/include/linux/radix-tree.h
@@ -155,6 +155,8 @@ void *radix_tree_delete(struct radix_tre
 unsigned int
 radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
--- linux-2.6.22-git15.orig/lib/radix-tree.c
+++ linux-2.6.22-git15/lib/radix-tree.c
@@ -599,6 +599,40 @@ int radix_tree_tag_get(struct radix_tree
 EXPORT_SYMBOL(radix_tree_tag_get);
 #endif
 
+static unsigned long
+radix_tree_scan_hole_dumb(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan)
+{
+   unsigned long i;
+
+   for (i = 0; i  max_scan; i++) {
+   if (!radix_tree_lookup(root, index))
+   break;
+   if (++index == 0)
+   break;
+   }
+
+   return index;
+}
+
+/**
+ * radix_tree_scan_hole-scan for hole
+ * @root:  radix tree root
+ * @index: index key
+ * @max_scan:  advice on max items to scan (it may scan a little more)
+ *
+ *  Scan forward from @index for a hole/empty item, stop when
+ *  - hit hole
+ *  - wrap-around to index 0
+ *  - @max_scan or more items scanned
+ */
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan)
+{
+   return radix_tree_scan_hole_dumb(root, index, max_scan);
+}
+EXPORT_SYMBOL(radix_tree_scan_hole);
+
 static unsigned int
 __lookup(struct radix_tree_node *slot, void **results, unsigned long index,
unsigned int max_items, unsigned long *next_index)

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos

2007-07-20 Thread Fengguang Wu
Combine the file_ra_state members
unsigned long prev_index
unsigned int prev_offset
into
loff_t prev_pos

It is more consistent and better supports huge files.

Thanks to Peter for the nice proposal!

Cc: Peter Zijlstra [EMAIL PROTECTED]
Cc: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 fs/ext3/dir.c  |2 +-
 fs/ext4/dir.c  |2 +-
 fs/splice.c|2 +-
 include/linux/fs.h |3 +--
 mm/filemap.c   |   11 ++-
 mm/readahead.c |   15 ---
 6 files changed, 18 insertions(+), 17 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -704,8 +704,7 @@ struct file_ra_state {
 
unsigned int ra_pages;  /* Maximum readahead window */
int mmap_miss;  /* Cache miss stat for mmap accesses */
-   unsigned long prev_index;   /* Cache last read() position */
-   unsigned int prev_offset;   /* Offset where last read() ended in a 
page */
+   loff_t prev_pos;/* Cache last read() position */
 };
 
 /*
--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -879,8 +879,8 @@ void do_generic_mapping_read(struct addr
cached_page = NULL;
index = *ppos  PAGE_CACHE_SHIFT;
next_index = index;
-   prev_index = ra.prev_index;
-   prev_offset = ra.prev_offset;
+   prev_index = ra.prev_pos  PAGE_CACHE_SHIFT;
+   prev_offset = ra.prev_pos  (PAGE_CACHE_SIZE-1);
last_index = (*ppos + desc-count + PAGE_CACHE_SIZE-1)  
PAGE_CACHE_SHIFT;
offset = *ppos  ~PAGE_CACHE_MASK;
 
@@ -966,7 +966,6 @@ page_ok:
index += offset  PAGE_CACHE_SHIFT;
offset = ~PAGE_CACHE_MASK;
prev_offset = offset;
-   ra.prev_offset = offset;
 
page_cache_release(page);
if (ret == nr  desc-count)
@@ -1056,7 +1055,9 @@ no_cached_page:
 
 out:
*_ra = ra;
-   _ra-prev_index = prev_index;
+   _ra-prev_pos = prev_index;
+   _ra-prev_pos = PAGE_CACHE_SHIFT;
+   _ra-prev_pos |= prev_offset;
 
*ppos = ((loff_t) index  PAGE_CACHE_SHIFT) + offset;
if (cached_page)
@@ -1415,7 +1416,7 @@ retry_find:
 * Found the page and have a reference on it.
 */
mark_page_accessed(page);
-   ra-prev_index = page-index;
+   ra-prev_pos = page-index  PAGE_CACHE_SHIFT;
vmf-page = page;
return ret | VM_FAULT_LOCKED;
 
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -45,7 +45,7 @@ void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
 {
ra-ra_pages = mapping-backing_dev_info-ra_pages;
-   ra-prev_index = -1;
+   ra-prev_pos = -1;
 }
 EXPORT_SYMBOL_GPL(file_ra_state_init);
 
@@ -326,7 +326,7 @@ static unsigned long get_next_ra_size(st
  * indicator. The flag won't be set on already cached pages, to avoid the
  * readahead-for-nothing fuss, saving pointless page cache lookups.
  *
- * prev_index tracks the last visited page in the _previous_ read request.
+ * prev_pos tracks the last visited byte in the _previous_ read request.
  * It should be maintained by the caller, and will be used for detecting
  * small random reads. Note that the readahead algorithm checks loosely
  * for sequential patterns. Hence interleaved reads might be served as
@@ -350,11 +350,9 @@ ondemand_readahead(struct address_space 
   bool hit_readahead_marker, pgoff_t offset,
   unsigned long req_size)
 {
-   int max;/* max readahead pages */
-   int sequential;
-
-   max = ra-ra_pages;
-   sequential = (offset - ra-prev_index = 1UL) || (req_size  max);
+   int max = ra-ra_pages; /* max readahead pages */
+   pgoff_t prev_offset;
+   int sequential;
 
/*
 * It's the expected callback offset, assume sequential access.
@@ -368,6 +366,9 @@ ondemand_readahead(struct address_space 
goto readit;
}
 
+   prev_offset = ra-prev_pos  PAGE_CACHE_SHIFT;
+   sequential = offset - prev_offset = 1UL || req_size  max;
+
/*
 * Standalone, small read.
 * Read as is, and do not pollute the readahead state.
--- linux-2.6.22-git15.orig/fs/ext3/dir.c
+++ linux-2.6.22-git15/fs/ext3/dir.c
@@ -143,7 +143,7 @@ static int ext3_readdir(struct file * fi
sb-s_bdev-bd_inode-i_mapping,
filp-f_ra, filp,
index, 1);
-   filp-f_ra.prev_index = index;
+   filp-f_ra.prev_pos = index  PAGE_CACHE_SHIFT;
bh = ext3_bread(NULL, inode, blk, 0, err);
}
 
--- 

[PATCH 4/8] trivial filemap.c cleanups

2007-07-20 Thread Fengguang Wu
- remove unused local next_index in do_generic_mapping_read()
- convert some 'unsigned long' to pgoff_t
- wrap a long line

Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 mm/filemap.c |   16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -866,11 +866,10 @@ void do_generic_mapping_read(struct addr
 read_actor_t actor)
 {
struct inode *inode = mapping-host;
-   unsigned long index;
-   unsigned long offset;
-   unsigned long last_index;
-   unsigned long next_index;
-   unsigned long prev_index;
+   pgoff_t index;
+   pgoff_t offset;
+   pgoff_t last_index;
+   pgoff_t prev_index;
unsigned int prev_offset;
struct page *cached_page;
int error;
@@ -878,7 +877,6 @@ void do_generic_mapping_read(struct addr
 
cached_page = NULL;
index = *ppos  PAGE_CACHE_SHIFT;
-   next_index = index;
prev_index = ra.prev_pos  PAGE_CACHE_SHIFT;
prev_offset = ra.prev_pos  (PAGE_CACHE_SIZE-1);
last_index = (*ppos + desc-count + PAGE_CACHE_SIZE-1)  
PAGE_CACHE_SHIFT;
@@ -1219,7 +1217,8 @@ out:
 }
 EXPORT_SYMBOL(generic_file_aio_read);
 
-int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long 
offset, unsigned long size)
+int file_send_actor(read_descriptor_t * desc, struct page *page,
+   unsigned long offset, unsigned long size)
 {
ssize_t written;
unsigned long count = desc-count;
@@ -1272,7 +1271,6 @@ asmlinkage ssize_t sys_readahead(int fd,
 }
 
 #ifdef CONFIG_MMU
-static int FASTCALL(page_cache_read(struct file * file, unsigned long offset));
 /**
  * page_cache_read - adds requested page to the page cache if not already there
  * @file:  file to read
@@ -1281,7 +1279,7 @@ static int FASTCALL(page_cache_read(stru
  * This adds the requested page to the page cache if it isn't already there,
  * and schedules an I/O to read in its contents from disk.
  */
-static int fastcall page_cache_read(struct file * file, unsigned long offset)
+static int fastcall page_cache_read(struct file * file, pgoff_t offset)
 {
struct address_space *mapping = file-f_mapping;
struct page *page; 

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] mmap read-around simplification

2007-07-20 Thread Fengguang Wu
Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss
and make it an int.

Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 include/linux/fs.h |3 +--
 mm/filemap.c   |4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -703,8 +703,7 @@ struct file_ra_state {
   there are only # of pages ahead */
 
unsigned int ra_pages;  /* Maximum readahead window */
-   unsigned long mmap_hit; /* Cache hit stat for mmap accesses */
-   unsigned long mmap_miss;/* Cache miss stat for mmap accesses */
+   int mmap_miss;  /* Cache miss stat for mmap accesses */
unsigned long prev_index;   /* Cache last read() position */
unsigned int prev_offset;   /* Offset where last read() ended in a 
page */
 };
--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -1369,7 +1369,7 @@ retry_find:
 * Do we miss much more than hit in this file? If so,
 * stop bothering with read-ahead. It will only hurt.
 */
-   if (ra-mmap_miss  ra-mmap_hit + MMAP_LOTSAMISS)
+   if (ra-mmap_miss  MMAP_LOTSAMISS)
goto no_cached_page;
 
/*
@@ -1395,7 +1395,7 @@ retry_find:
}
 
if (!did_readaround)
-   ra-mmap_hit++;
+   ra-mmap_miss--;
 
/*
 * We have a locked page in the page cache, now we need to check

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Fengguang Wu
Sorry, forgot to prefix the patch titles with [readahead].
Should I repost?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] [PATCH 1/5] ehca: Supports large page MRs

2007-07-20 Thread Roland Dreier
I applied this, but I agree with checkpatch.pl:

  WARNING: externs should be avoided in .c files
  #227: FILE: drivers/infiniband/hw/ehca/ehca_mrmw.c:67:
  +extern int ehca_mr_largepage;
  
  WARNING: externs should be avoided in .c files
  #949: FILE: drivers/infiniband/hw/ehca/hcp_if.c:753:
  +extern int ehca_debug_level;

if you need to use a variable in more than one .c file, put the extern
declaration in a common header that's included everywhere you use the
variable, including the .c file that it is defined in.  That way the
compiler can see if you get confused about the type of the variable.

When you get a chance, please post a follow-on patch to fix this.

 - R.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] ehca: Generate event when SRQ limit reached

2007-07-20 Thread Roland Dreier
thanks, applied.

BTW, does your SRQ-capable hardware support generating the last WQE
reached event?  There's not any reliable way to avoid problems when
destroying QPs attached to an SRQ without it, and the IB spec requires
CAs that support SRQs to generate it (o11-5.2.5 in chapter 11 of vol 1).

I don't see any code in ehca to generate the event, and IPoIB CM at
least will be very unhappy when using SRQs if the event is not
generated.

 - R.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] ehca: Make ehca2ib_return_code() non-inline

2007-07-20 Thread Roland Dreier
thanks, applied
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fixing lables after GNU indent (Re: [PATCH 1/2] run scripts/Lindent on it to match Documentation/CodingStyle)

2007-07-20 Thread Oleg Verych
[]
   sed -i -e 's/^\t*  \(\w*:\)/ \1/' $@
  
   which will replace the leading tabs and spaces with one space.
   It should leave case labels unmolested, as they should be indented with
   tabs, not 6 spaces.
  
   Any regexp ninjas want to have a go at something better?
  
  I'm the one. Trying to write portable, optimized and easy to
  understand scripts [0].
  
  Please, describe more what must be done, and i will do it. Case labels
  are handled very strangely in you example.
 
 OK.  indent will indent labels to a column number that's a multiple of
 8, plus 6.  So it may start in column 6, 14, 20, 28, etc.  I'm not quite
 sure what the definition of a label is; I had it as \w*: up there, but I
 don't know if that would match the _.  The point is to *not* handle case
 labels, only goto labels.

t=`printf '\t'`
sed -i s_^\($t*\)  *\([^:]*:\)_\1\2_ $@
  ^-_
I'm not sure about leaving one space `here, otherwise it removes
spaces between (supposedly right indented) line start, i.e. nothing or
tab(s), and a label, i.e. `label_name:' without space before colon;
`label_name' here actually not a colon, let's leave that kind of
breakage to compiler.

The variable $t is used for readability of the regex and because POSIX
BREs leave undefined characters after a backslash, POSIX sed defines
only \n as a new line.

--
-o--=O`C
 #oo'L O
___=E M
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ofa-general] [PATCH 5/5] ehca: Support small QP queues

2007-07-20 Thread Roland Dreier
thanks, applied.  I fixed this up myself to work with commit 20c2df83,
which got rid of the destructor argument to kmem_cache_create() -- you
probably want to check my tree to make sure it's OK.

Also the same as I said before about checkpatch.pl's warning:

WARNING: externs should be avoided in .c files
#337: FILE: drivers/infiniband/hw/ehca/ehca_pd.c:91:
+   extern struct kmem_cache *small_qp_cache;

please fix that up when you get a chance
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Linus Torvalds


On Sat, 21 Jul 2007, Fengguang Wu wrote:

 Sorry, forgot to prefix the patch titles with [readahead].
 Should I repost?

Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 
even if it does mean missing the merge window this time around. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Fengguang Wu
On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote:
 
 
 On Sat, 21 Jul 2007, Fengguang Wu wrote:
 
  Sorry, forgot to prefix the patch titles with [readahead].
  Should I repost?
 
 Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 
 even if it does mean missing the merge window this time around. 

OK. Let me repost it...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/7] readahead: remove the limit max_sectors_kb imposed on max_readahead_kb

2007-07-20 Thread Fengguang Wu
Remove the size limit max_sectors_kb imposed on max_readahead_kb.

The size restriction is unreasonable. Especially when max_sectors_kb cannot
grow larger than max_hw_sectors_kb, which can be rather small for some disk
drives.

Cc: Jens Axboe [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
Acked-by: Jens Axboe [EMAIL PROTECTED]
---
 block/ll_rw_blk.c |9 -
 1 file changed, 9 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/block/ll_rw_blk.c
+++ linux-2.6.22-rc6-mm1/block/ll_rw_blk.c
@@ -3945,7 +3945,6 @@ queue_max_sectors_store(struct request_q
max_hw_sectors_kb = q-max_hw_sectors  1,
page_kb = 1  (PAGE_CACHE_SHIFT - 10);
ssize_t ret = queue_var_store(max_sectors_kb, page, count);
-   int ra_kb;
 
if (max_sectors_kb  max_hw_sectors_kb || max_sectors_kb  page_kb)
return -EINVAL;
@@ -3954,14 +3953,6 @@ queue_max_sectors_store(struct request_q
 * values synchronously:
 */
spin_lock_irq(q-queue_lock);
-   /*
-* Trim readahead window as well, if necessary:
-*/
-   ra_kb = q-backing_dev_info.ra_pages  (PAGE_CACHE_SHIFT - 10);
-   if (ra_kb  max_sectors_kb)
-   q-backing_dev_info.ra_pages =
-   max_sectors_kb  (PAGE_CACHE_SHIFT - 10);
-
q-max_sectors = max_sectors_kb  1;
spin_unlock_irq(q-queue_lock);
 

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/7] readahead: remove several readahead macros

2007-07-20 Thread Fengguang Wu
Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES.

Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 include/linux/mm.h |2 --
 mm/readahead.c |   10 +-
 2 files changed, 1 insertion(+), 11 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/include/linux/mm.h
+++ linux-2.6.22-rc6-mm1/include/linux/mm.h
@@ -1148,8 +1148,6 @@ int write_one_page(struct page *page, in
 /* readahead.c */
 #define VM_MAX_READAHEAD   128 /* kbytes */
 #define VM_MIN_READAHEAD   16  /* kbytes (includes current page) */
-#define VM_MAX_CACHE_HIT   256 /* max pages in a row in cache before
-* turning readahead off */
 
 int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
pgoff_t offset, unsigned long nr_to_read);
--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -21,16 +21,8 @@ void default_unplug_io_fn(struct backing
 }
 EXPORT_SYMBOL(default_unplug_io_fn);
 
-/*
- * Convienent macros for min/max read-ahead pages.
- * Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up.
- * The latter is necessary for systems with large page size(i.e. 64k).
- */
-#define MAX_RA_PAGES   (VM_MAX_READAHEAD*1024 / PAGE_CACHE_SIZE)
-#define MIN_RA_PAGES   DIV_ROUND_UP(VM_MIN_READAHEAD*1024, PAGE_CACHE_SIZE)
-
 struct backing_dev_info default_backing_dev_info = {
-   .ra_pages   = MAX_RA_PAGES,
+   .ra_pages   = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
.state  = 0,
.capabilities   = BDI_CAP_MAP_COPY,
.unplug_io_fn   = default_unplug_io_fn,

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] readahead: basic support of interleaved reads

2007-07-20 Thread Fengguang Wu
This is a simplified version of the pagecache context based readahead.
It handles the case of multiple threads reading on the same fd and invalidating
each others' readahead state. It does the trick by scanning the pagecache and
recovering the current read stream's readahead status.

The algorithm works in a opportunistic way, in that it do not try to detect
interleaved reads _actively_, which requires a probe into the page cache(which
means a little more overheads for random reads). It only tries to handle a
previously started sequential readahead whose state was overwritten by
another concurrent stream, and it can do this job pretty well.

Negative and positive examples(or what you can expect from it):

1) it cannot detect and serve perfect request-by-request interleaved reads
   right:
timestream 1  stream 2
0   1 
1 1001
2   2
3 1002
4   3
5 1003
6   4
7 1004
8   5
9 1005
Here no single readahead will be carried out.

2) However, if it's two concurrent reads by two threads, the chance of the
   initial sequential readahead be started is huge. Once the first sequential
   readahead is started for a stream, this patch will ensure that the readahead
   window continues to rampup and won't be disturbed by other streams.

timestream 1  stream 2
0   1 
1   2
2 1001
3   3
4 1002
5 1003
6   4
7   5
8 1004
9   6
101005
11  7
121006
131007
Here steam 1 will start a readahead at page 2, and stream 2 will start its
first readahead at page 1003. From then on the two streams will be served right.

Cc: Rusty Russell [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 mm/readahead.c |   33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -363,6 +363,29 @@ ondemand_readahead(struct address_space 
}
 
/*
+* Hit a marked page without valid readahead state.
+* E.g. interleaved reads.
+* Query the pagecache for async_size, which normally equals to
+* readahead size. Ramp it up and use it as the new readahead size.
+*/
+   if (hit_readahead_marker) {
+   pgoff_t start;
+
+   read_lock_irq(mapping-tree_lock);
+   start = radix_tree_scan_hole(mapping-page_tree, offset, 
max+1);
+   read_unlock_irq(mapping-tree_lock);
+
+   if (!start || start - offset  max)
+   return 0;
+
+   ra-start = start;
+   ra-size = start - offset;  /* old async_size */
+   ra-size = get_next_ra_size(ra, max);
+   ra-async_size = ra-size;
+   goto readit;
+   }
+
+   /*
 * It may be one of
 *  - first read on start of file
 *  - sequential cache miss
@@ -373,16 +396,6 @@ ondemand_readahead(struct address_space 
ra-size = get_init_ra_size(req_size, max);
ra-async_size = ra-size  req_size ? ra-size - req_size : ra-size;
 
-   /*
-* Hit on a marked page without valid readahead state.
-* E.g. interleaved reads.
-* Not knowing its readahead pos/size, bet on the minimal possible one.
-*/
-   if (hit_readahead_marker) {
-   ra-start++;
-   ra-size = get_next_ra_size(ra, max);
-   }
-
 readit:
return ra_submit(ra, mapping, filp);
 }

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/7] readahead: compacting file_ra_state

2007-07-20 Thread Fengguang Wu
Use 'unsigned int' instead of 'unsigned long' for readahead sizes.

This helps reduce memory consumption on 64bit CPU when
a lot of files are opened.

CC: Andi Kleen [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 include/linux/fs.h |8 
 mm/filemap.c   |2 +-
 mm/readahead.c |2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h
+++ linux-2.6.22-rc6-mm1/include/linux/fs.h
@@ -771,12 +771,12 @@ struct fown_struct {
  * Track a single file's readahead state
  */
 struct file_ra_state {
-   pgoff_t start;  /* where readahead started */
-   unsigned long size; /* # of readahead pages */
-   unsigned long async_size;   /* do asynchronous readahead when
+   pgoff_t start;  /* where readahead started */
+   unsigned int size;  /* # of readahead pages */
+   unsigned int async_size;/* do asynchronous readahead when
   there are only # of pages ahead */
 
-   unsigned long ra_pages; /* Maximum readahead window */
+   unsigned int ra_pages;  /* Maximum readahead window */
unsigned long mmap_hit; /* Cache hit stat for mmap accesses */
unsigned long mmap_miss;/* Cache miss stat for mmap accesses */
unsigned long prev_index;   /* Cache last read() position */
--- linux-2.6.22-rc6-mm1.orig/mm/filemap.c
+++ linux-2.6.22-rc6-mm1/mm/filemap.c
@@ -840,7 +840,7 @@ static void shrink_readahead_size_eio(st
if (count  5)
return;
count++;
-   printk(KERN_WARNING Reducing readahead size to %luK\n,
+   printk(KERN_WARNING Reducing readahead size to %dK\n,
ra-ra_pages  (PAGE_CACHE_SHIFT - 10));
 }
 
--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -342,7 +342,7 @@ ondemand_readahead(struct address_space 
   bool hit_readahead_marker, pgoff_t offset,
   unsigned long req_size)
 {
-   unsigned long max;  /* max readahead pages */
+   int max;/* max readahead pages */
int sequential;
 
max = ra-ra_pages;

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] radixtree: introduce radix_tree_scan_hole()

2007-07-20 Thread Fengguang Wu
Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree
for the first hole. It will be used in interleaved readahead.

The implementation is dumb and obviously correct.
It can help debug(and document) the possible smart one in future.

Cc: Nick Piggin [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---

 include/linux/radix-tree.h |2 ++
 lib/radix-tree.c   |   34 ++
 2 files changed, 36 insertions(+)

--- linux-2.6.22-rc6-mm1.orig/include/linux/radix-tree.h
+++ linux-2.6.22-rc6-mm1/include/linux/radix-tree.h
@@ -155,6 +155,8 @@ void *radix_tree_delete(struct radix_tre
 unsigned int
 radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
--- linux-2.6.22-rc6-mm1.orig/lib/radix-tree.c
+++ linux-2.6.22-rc6-mm1/lib/radix-tree.c
@@ -601,6 +601,40 @@ int radix_tree_tag_get(struct radix_tree
 EXPORT_SYMBOL(radix_tree_tag_get);
 #endif
 
+static unsigned long
+radix_tree_scan_hole_dumb(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan)
+{
+   unsigned long i;
+
+   for (i = 0; i  max_scan; i++) {
+   if (!radix_tree_lookup(root, index))
+   break;
+   if (++index == 0)
+   break;
+   }
+
+   return index;
+}
+
+/**
+ * radix_tree_scan_hole-scan for hole
+ * @root:  radix tree root
+ * @index: index key
+ * @max_scan:  advice on max items to scan (it may scan a little more)
+ *
+ *  Scan forward from @index for a hole/empty item, stop when
+ *  - hit hole
+ *  - wrap-around to index 0
+ *  - @max_scan or more items scanned
+ */
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+   unsigned long index, unsigned long max_scan)
+{
+   return radix_tree_scan_hole_dumb(root, index, max_scan);
+}
+EXPORT_SYMBOL(radix_tree_scan_hole);
+
 static unsigned int
 __lookup(struct radix_tree_node *slot, void **results, unsigned long index,
unsigned int max_items, unsigned long *next_index)

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/7] readahead: mmap read-around simplification

2007-07-20 Thread Fengguang Wu
Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss
and make it an int.

Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 include/linux/fs.h |3 +--
 mm/filemap.c   |4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h
+++ linux-2.6.22-rc6-mm1/include/linux/fs.h
@@ -777,8 +777,7 @@ struct file_ra_state {
   there are only # of pages ahead */
 
unsigned int ra_pages;  /* Maximum readahead window */
-   unsigned long mmap_hit; /* Cache hit stat for mmap accesses */
-   unsigned long mmap_miss;/* Cache miss stat for mmap accesses */
+   int mmap_miss;  /* Cache miss stat for mmap accesses */
unsigned long prev_index;   /* Cache last read() position */
unsigned int prev_offset;   /* Offset where last read() ended in a 
page */
 };
--- linux-2.6.22-rc6-mm1.orig/mm/filemap.c
+++ linux-2.6.22-rc6-mm1/mm/filemap.c
@@ -1389,7 +1389,7 @@ retry_find:
 * Do we miss much more than hit in this file? If so,
 * stop bothering with read-ahead. It will only hurt.
 */
-   if (ra-mmap_miss  ra-mmap_hit + MMAP_LOTSAMISS)
+   if (ra-mmap_miss  MMAP_LOTSAMISS)
goto no_cached_page;
 
/*
@@ -1415,7 +1415,7 @@ retry_find:
}
 
if (!did_readaround)
-   ra-mmap_hit++;
+   ra-mmap_miss--;
 
/*
 * We have a locked page in the page cache, now we need to check

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] readahead: combine file_ra_state.prev_index/prev_offset into prev_pos

2007-07-20 Thread Fengguang Wu
Combine the file_ra_state members
unsigned long prev_index
unsigned int prev_offset
into
loff_t prev_pos

It is more consistent and better supports huge files.

Thanks to Peter for the nice proposal!

Cc: Peter Zijlstra [EMAIL PROTECTED]
Signed-off-by: Fengguang Wu [EMAIL PROTECTED]
---
 fs/ext3/dir.c  |2 +-
 fs/ext4/dir.c  |2 +-
 fs/splice.c|2 +-
 include/linux/fs.h |3 +--
 mm/filemap.c   |   11 ++-
 mm/readahead.c |   15 ---
 6 files changed, 18 insertions(+), 17 deletions(-)

--- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h
+++ linux-2.6.22-rc6-mm1/include/linux/fs.h
@@ -778,8 +778,7 @@ struct file_ra_state {
 
unsigned int ra_pages;  /* Maximum readahead window */
int mmap_miss;  /* Cache miss stat for mmap accesses */
-   unsigned long prev_index;   /* Cache last read() position */
-   unsigned int prev_offset;   /* Offset where last read() ended in a 
page */
+   loff_t prev_pos;/* Cache last read() position */
 };
 
 /*
--- linux-2.6.22-rc6-mm1.orig/mm/filemap.c
+++ linux-2.6.22-rc6-mm1/mm/filemap.c
@@ -881,8 +881,8 @@ void do_generic_mapping_read(struct addr
 
index = *ppos  PAGE_CACHE_SHIFT;
next_index = index;
-   prev_index = ra.prev_index;
-   prev_offset = ra.prev_offset;
+   prev_index = ra.prev_pos  PAGE_CACHE_SHIFT;
+   prev_offset = ra.prev_pos  (PAGE_CACHE_SIZE-1);
last_index = (*ppos + desc-count + PAGE_CACHE_SIZE-1)  
PAGE_CACHE_SHIFT;
offset = *ppos  ~PAGE_CACHE_MASK;
 
@@ -968,7 +968,6 @@ page_ok:
index += offset  PAGE_CACHE_SHIFT;
offset = ~PAGE_CACHE_MASK;
prev_offset = offset;
-   ra.prev_offset = offset;
 
page_cache_release(page);
if (ret == nr  desc-count)
@@ -1055,7 +1054,9 @@ no_cached_page:
 
 out:
*_ra = ra;
-   _ra-prev_index = prev_index;
+   _ra-prev_pos = prev_index;
+   _ra-prev_pos = PAGE_CACHE_SHIFT;
+   _ra-prev_pos |= prev_offset;
 
*ppos = ((loff_t) index  PAGE_CACHE_SHIFT) + offset;
if (filp)
@@ -1435,7 +1436,7 @@ retry_find:
 * Found the page and have a reference on it.
 */
mark_page_accessed(page);
-   ra-prev_index = page-index;
+   ra-prev_pos = page-index  PAGE_CACHE_SHIFT;
return page;
 
 outside_data_content:
--- linux-2.6.22-rc6-mm1.orig/mm/readahead.c
+++ linux-2.6.22-rc6-mm1/mm/readahead.c
@@ -45,7 +45,7 @@ void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
 {
ra-ra_pages = mapping-backing_dev_info-ra_pages;
-   ra-prev_index = -1;
+   ra-prev_pos = -1;
 }
 EXPORT_SYMBOL_GPL(file_ra_state_init);
 
@@ -318,7 +318,7 @@ static unsigned long get_next_ra_size(st
  * indicator. The flag won't be set on already cached pages, to avoid the
  * readahead-for-nothing fuss, saving pointless page cache lookups.
  *
- * prev_index tracks the last visited page in the _previous_ read request.
+ * prev_pos tracks the last visited byte in the _previous_ read request.
  * It should be maintained by the caller, and will be used for detecting
  * small random reads. Note that the readahead algorithm checks loosely
  * for sequential patterns. Hence interleaved reads might be served as
@@ -342,11 +342,9 @@ ondemand_readahead(struct address_space 
   bool hit_readahead_marker, pgoff_t offset,
   unsigned long req_size)
 {
-   int max;/* max readahead pages */
-   int sequential;
-
-   max = ra-ra_pages;
-   sequential = (offset - ra-prev_index = 1UL) || (req_size  max);
+   int max = ra-ra_pages; /* max readahead pages */
+   pgoff_t prev_offset;
+   int sequential;
 
/*
 * It's the expected callback offset, assume sequential access.
@@ -360,6 +358,9 @@ ondemand_readahead(struct address_space 
goto readit;
}
 
+   prev_offset = ra-prev_pos  PAGE_CACHE_SHIFT;
+   sequential = offset - prev_offset = 1UL || req_size  max;
+
/*
 * Standalone, small read.
 * Read as is, and do not pollute the readahead state.
--- linux-2.6.22-rc6-mm1.orig/fs/ext3/dir.c
+++ linux-2.6.22-rc6-mm1/fs/ext3/dir.c
@@ -143,7 +143,7 @@ static int ext3_readdir(struct file * fi
sb-s_bdev-bd_inode-i_mapping,
filp-f_ra, filp,
index, 1);
-   filp-f_ra.prev_index = index;
+   filp-f_ra.prev_pos = index  PAGE_CACHE_SHIFT;
bh = ext3_bread(NULL, inode, blk, 0, err);
}
 
--- linux-2.6.22-rc6-mm1.orig/fs/ext4/dir.c
+++ linux-2.6.22-rc6-mm1/fs/ext4/dir.c
@@ -142,7 +142,7 @@ static 

[PATCH 0/7] readahead cleanups and interleaved readahead take 3

2007-07-20 Thread Fengguang Wu
Andrew,

The following patches are based on yesterday's discussions, compiled and
tested OK:

smaller file_ra_state:
[PATCH 1/7] readahead: compacting file_ra_state
[PATCH 2/7] readahead: mmap read-around simplification
[PATCH 3/7] readahead: combine file_ra_state.prev_index/prev_offset 
into prev_

code cleanups:
[PATCH 4/7] readahead: remove several readahead macros
[PATCH 5/7] readahead: remove the limit max_sectors_kb imposed on 
max_readahead_kb

support of interleaved reads:
[PATCH 6/7] radixtree: introduce radix_tree_scan_hole()
[PATCH 7/7] readahead: basic support of interleaved reads

The diffstat is

 block/ll_rw_blk.c  |9 -
 fs/ext3/dir.c  |2 -
 fs/ext4/dir.c  |2 -
 fs/splice.c|2 -
 include/linux/fs.h |   14 +++-
 include/linux/mm.h |2 -
 include/linux/radix-tree.h |2 +
 lib/radix-tree.c   |   34 
 mm/filemap.c   |   17 +-
 mm/readahead.c |   58 +++
 10 files changed, 86 insertions(+), 56 deletions(-)

Regards,
Fengguang Wu
--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] please pull infiniband.git

2007-07-20 Thread Roland Dreier
Linus, please pull from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git 
for-linus

This will get another small batch of changes for 2.6.23:

Arthur Jones (1):
  IB/ipath: Remove ipath_layer dead code

Florin Malita (1):
  IB/mlx4: Fix leaks in __mlx4_ib_modify_qp

Hoang-Nam Nguyen (3):
  IB/ehca: Support large page MRs
  IB/ehca: Generate async event when SRQ limit reached
  IB/ehca: Move ehca2ib_return_code() out of line

Joachim Fenkes (1):
  IB/ehca: Make internal_create/destroy_qp() static

Michael S. Tsirkin (1):
  IB/mthca: Change command token on timeout

Roland Dreier (2):
  mlx4_core: Change command token on timeout
  IB/mlx4: Fix error path in create_qp_common()

Stefan Roscher (1):
  IB/ehca: Support small QP queues

 drivers/infiniband/hw/ehca/ehca_classes.h |   50 +++--
 drivers/infiniband/hw/ehca/ehca_cq.c  |8 +-
 drivers/infiniband/hw/ehca/ehca_eq.c  |8 +-
 drivers/infiniband/hw/ehca/ehca_irq.c |   42 +++-
 drivers/infiniband/hw/ehca/ehca_main.c|   49 -
 drivers/infiniband/hw/ehca/ehca_mrmw.c|  371 -
 drivers/infiniband/hw/ehca/ehca_mrmw.h|2 +-
 drivers/infiniband/hw/ehca/ehca_pd.c  |   25 ++-
 drivers/infiniband/hw/ehca/ehca_qp.c  |  178 --
 drivers/infiniband/hw/ehca/ehca_tools.h   |   19 +--
 drivers/infiniband/hw/ehca/ehca_uverbs.c  |2 +-
 drivers/infiniband/hw/ehca/hcp_if.c   |   50 +++-
 drivers/infiniband/hw/ehca/ipz_pt_fn.c|  222 +
 drivers/infiniband/hw/ehca/ipz_pt_fn.h|   26 ++-
 drivers/infiniband/hw/ipath/Makefile  |1 -
 drivers/infiniband/hw/ipath/ipath_layer.c |  365 
 drivers/infiniband/hw/ipath/ipath_layer.h |   71 --
 drivers/infiniband/hw/ipath/ipath_verbs.h |2 -
 drivers/infiniband/hw/mlx4/qp.c   |   20 +-
 drivers/infiniband/hw/mthca/mthca_cmd.c   |3 +-
 drivers/net/mlx4/cmd.c|3 +-
 21 files changed, 802 insertions(+), 715 deletions(-)
 delete mode 100644 drivers/infiniband/hw/ipath/ipath_layer.c
 delete mode 100644 drivers/infiniband/hw/ipath/ipath_layer.h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: where is the code for read system call?

2007-07-20 Thread Folkert van Heusden
 My application reads from socket. I need to change the behavior of read
 system call for an experiment. Can someone point me to code?

Wouldn't it be easier to create a preload-library-wrapper around glibc?


Folkert van Heusden

-- 
MultiTail is a versatile tool for watching logfiles and output of
commands. Filtering, coloring, merging, diff-view, etc.
http://www.vanheusden.com/multitail/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Nigel Cunningham
Hi.

On Saturday 21 July 2007 08:43:20 [EMAIL PROTECTED] wrote:
 On Fri, 20 Jul 2007, Alan Stern wrote:
 
  On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote:
 
  when doing a suspend-to-ram you get to a point where you just don't use
  any userspace.
 
  What do you mean?  How can you prevent user tasks from running?  That's
  basically what the freezer does, and the whole point of this approach
  is to eliminate the freezer.  Right?
 
  Presumably no tasks at all would be scheduled.
 
  How would you prevent tasks from being scheduled?  How would you
  prevent drivers from deadlocking because in order to put their device
  in a low-power state they need to acquire a lock which is held by a
  user task?
 
 you give up on the suspend becouse you have no way of getting the user 
 task to give up the lock.
 
 however, kernel locks should not be held by user tasks, user tasks are not 
 expected to behave in rational ways, allowing them to compete with kernel 
 tasks for locks is a sure way to get a deadlock or indefinate stall.
 
 what locks are accessed this way?

Any userspace process can do a syscall. In the process of the syscall, it can 
take kernel locks, and it can schedule (eg, while seeking to take a second 
lock).

Regards,

Nigel


pgpl7edMXgJyR.pgp
Description: PGP signature


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Andi Kleen
On Saturday 21 July 2007 00:32, Thomas Gleixner wrote:
 We are pleased to announce a project we've been working on for some
 time: the unified x86 architecture tree, or arch/x86 - and we'd like
 to solicit feedback about it.

Well you know my position on this. I think it's a bad idea because
it means we can never get rid of any old junk. IMNSHO arch/x86_64
is significantly cleaner and simpler in many ways than arch/i386 and I would
like to preserve that. Also in general arch/x86_64 is much easier to hack
than arch/i386 because it's easier to regression test and in general
has to care about much less junk. And I don't 
know of any way to ever fix that for i386 besides splitting the old
stuff off completely.

Besides radical file movements like this are bad anyways. They cause
a big break in patchkits and forward/backwards porting that doesn't 
really help anybody.

 This causes double maintenance
 even for functionality that is conceptually the same for the 32-bit and
 the 64-bit tree. (such as support for standard PC platform architecture
 devices)

It's not really the same platform: one is PC hardware going back forever
with zillions of bugs, the other is modern PC platforms which much less
bugs and quirks

To see it otherwise it's more a junkification of arch/x86_64 than
a cleanup of arch/i386 -- in fact you didn't really clean up arch/i386 
at all.

 How did we do it?
 -

 As an initial matter, we made it painstakingly sure that the resulting
 .o files in a 32-bit build are bit for bit equal.

You got not a single line less code duplication then, so i don't really
see the point of this.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Andi Kleen
On Saturday 21 July 2007 01:55, Michal Piotrowski wrote:
 Hi,

 On 21/07/07, Thomas Gleixner [EMAIL PROTECTED] wrote:
  We are pleased to announce a project we've been working on for some
  time: the unified x86 architecture tree, or arch/x86 - and we'd like
  to solicit feedback about it.
 
  What is this about?

 [..]

  As usual, comments and suggestions are welcome!

 I really like this idea - code duplication is a bad thing.

Did you actually look at the patch? It doesn't have a single line
less duplication than there was before. Everything that could
be easily shared was shared already. 

It's just new window dressing without any real advantages.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] console: fix section mismatch warning in vgacon.c

2007-07-20 Thread Sam Ravnborg
On Sat, Jul 21, 2007 at 07:37:29AM +0800, Antonino A. Daplas wrote:
 On Fri, 2007-07-20 at 23:27 +0200, Sam Ravnborg wrote:
  Fix following section mismatch warning:
  WARNING: vmlinux.o(.text+0x121e62): Section mismatch: reference to 
  .init.text:__alloc_bootmem (between 'vgacon_startup' and 
  'vgacon_scrolldelta')
  
  Browsing the code it seems that vgacon_scrollback_startup() is only
  called during the init phase so the reference to the .init.text
  section is OK.
  Teach modpost not to warn using ___init_refok.
  
  Signed-off-by: Sam Ravnborg [EMAIL PROTECTED]
 Acked-by: Antonino Daplas [EMAIL PROTECTED]

Thanks. Will you take care of forwarding it it or do we rely
on Andrew in this area?

Sam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] radixtree: introduce radix_tree_scan_hole()

2007-07-20 Thread Andrew Morton
On Sat, 21 Jul 2007 12:43:06 +0800 Fengguang Wu [EMAIL PROTECTED] wrote:

 Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree
 for the first hole. It will be used in interleaved readahead.

If you're ever feeling fantastically bored, please consider updating the
userspace radix-tree test harness for this?  Cook up a couple of testcases
for the new functionality?

Thanks.

http://www.zip.com.au/~akpm/linux/patches/stuff/rtth.tar.gz is the latest.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, Announce] Unified x86 architecture, arch/x86

2007-07-20 Thread Steven Rostedt


 On Saturday 21 July 2007 01:55, Michal Piotrowski wrote:
 
  I really like this idea - code duplication is a bad thing.

 Did you actually look at the patch? It doesn't have a single line
 less duplication than there was before. Everything that could
 be easily shared was shared already.

 It's just new window dressing without any real advantages.

And did you read what tglx wrote?

This patch was the beginning of the merger, not the end result. It strived
for binary identical images. It was to put everything together as a
_starting_point_!   The next thing to do after this is to start the
merging.

-- Steve

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] compacting file_ra_state

2007-07-20 Thread Andi Kleen
On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote:
 
 
 On Sat, 21 Jul 2007, Fengguang Wu wrote:
 
  Sorry, forgot to prefix the patch titles with [readahead].
  Should I repost?
 
 Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 

Haven't the readahead patches already essentially been in -mm* for some time?
I thought the new patches were some some restructured code, but essentially
the tested algorithms? 

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Move KVM, paravirt, lguest, VMI and Xen under arch-level Virtualization option

2007-07-20 Thread Rusty Russell
On Fri, 2007-07-20 at 08:24 +0300, Avi Kivity wrote:
 Rusty Russell wrote:
  Any objections?
 
  Rusty.
  ===
  Having KVM appear in the middle of drivers is kinda strange, and
  having it alone under a menu called virtualization doubly so.
 
  1) Move the Virtualization menu into the arch-specific i386 and
 x86-64 Kconfig.

 
 Virtualization is hardly x86 specific.  How about moving it to
 top-level, and having individual items disable themselves on archs they
 don't apply to?
 
 Otherwise we end up with $NARCH copies of that Kconfig, each slightly
 different.  The top-level entry can be made to depend on the archs that
 actually have some virt capability, so as not to show empty an menu.

I dislike the duplication, too, but 

1) it's a CPU capability, and that's where it belongs in the menu.
2) And as you can see from the difference between the x86_64 and i386
help text, there are real platform differences (and not mentioning
what's under the menu would be kinda cheating).
3) Virtualization doesn't even make sense as an option for some
platforms where it's always on.

Cheers,
Rusty.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.

2007-07-20 Thread Paul Mundt
zone_movable_pfn is presently marked as __initdata and referenced
from adjust_zone_range_for_zone_movable(), which in turn is
referenced by zone_spanned_pages_in_node(). Both of these are
__meminit annotated. When memory hotplug is enabled, this will oops
on a hot-add, due to zone_movable_pfn having been freed.

__meminitdata annotation gives the desired behaviour.

This will only impact platforms that enable both memory hotplug
and ARCH_POPULATES_NODE_MAP.

Signed-off-by: Paul Mundt [EMAIL PROTECTED]

--

 mm/page_alloc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 43cb3b3..40954fb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -138,7 +138,7 @@ static unsigned long __meminitdata dma_reserve;
 #endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */
   unsigned long __initdata required_kernelcore;
   unsigned long __initdata required_movablecore;
-  unsigned long __initdata zone_movable_pfn[MAX_NUMNODES];
+  unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
 
   /* movable_zone is the real zone pages in ZONE_MOVABLE are taken from */
   int movable_zone;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] virtual sched_clock() for s390

2007-07-20 Thread Jeremy Fitzhardinge
Paul Mackerras wrote:
 Do you think this makes the PURR more useful for CFS, or less?  To me
 it looks like this would mean that CFS can make a more equitable
 distribution of CPU time if, for example, you had 3 runnable tasks on
 a 2-core x dual-threaded machine (4 virtual CPUs).
   

Sounds reasonable to me.  I've proposed in the past that sched_clock
should be scaled by the cpufreq frequency to achieve the same effect
(ie, measure the actual number of cpu cycles that are really available
to tasks).

But more specifically, what you've described is exactly analogous to
hypervisor stolen time, since one thread steals time from the other.

 BTW, what does time spent running during sleep mean?  Does it mean
 time that other tasks are running while this task is sleeping?
   

That's how I interpreted it.  You're only credited for sleeping if
someone else wanted the CPU in the meantime.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix Dreamcast DMA

2007-07-20 Thread Mike Frysinger
On Thursday 19 July 2007, Adrian McMenamin wrote:
 Signed-off by: Adrian McMenamin [EMAIL PROTECTED]
 @@ -183,6 +183,7 @@ int dmac_search_free_channel(const char *dev_id)
   return result;

   atomic_set(channel-busy, 1);
 +
   return channel-chan;
   }

 @@ -194,7 +195,6 @@ int request_dma(unsigned int chan, const char
 *dev_id)
   struct dma_channel *channel = { 0 };
   struct dma_info *info = get_dma_info(chan);
   int result;
 -
   channel = get_dma_channel(chan);
   if (atomic_xchg(channel-busy, 1))
   return -EBUSY;
 @@ -387,7 +388,7 @@ int register_dmac(struct dma_info *info)
   }

   list_add(info-list, registered_dmac_list);
 -
 +
   return 0;
  }
  EXPORT_SYMBOL(register_dmac);

seems like whitespace noise in here ...
-mike


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] [updated] PHY fixed driver: rework release path and update phy_id notation

2007-07-20 Thread Andrew Morton
On Thu, 19 Jul 2007 03:38:04 +0400 Vitaly Bordug [EMAIL PROTECTED] wrote:

 
 device_bind_driver() error code returning has been fixed.  
 release() function has been written, so that to free resources 
 in correct way; the release path is now clean. 
  
 Before the rework, it used to cause 
  Device '[EMAIL PROTECTED]:1' does not have a release() function, it is 
 broken 
  and must be fixed. 
  BUG: at drivers/base/core.c:104 device_release() 
   
  Call Trace:   
   [802ec380] kobject_cleanup+0x53/0x7e 
   [802ec3ab] kobject_release+0x0/0x9 
   [802ecf3f] kref_put+0x74/0x81 
   [8035493b] fixed_mdio_register_device+0x230/0x265 
   [80564d31] fixed_init+0x1f/0x35 
   [802071a4] init+0x147/0x2fb 
   [80223b6e] schedule_tail+0x36/0x92 
   [8020a678] child_rip+0xa/0x12 
   [80311714] acpi_ds_init_one_object+0x0/0x83 
   [8020705d] init+0x0/0x2fb 
   [8020a66e] child_rip+0x0/0x12   
  
  
 Also changed the notation of the fixed phy definition on 
 mdio bus to the form of speed+duplex to make it able to be used by 
 gianfar and ucc_geth that define phy_id strictly as %d:%d and cleaned up 
 the whitespace issues.
  

Confused.  Does the above refer to the difference between this patch and
the previous version, or does it just describe this patch?  Hopefully the
latter, because the former isn't interesting, long-term.

If is _is_ a full standalone description of this patch then it's a bit hard
to follow ;)

 +config FIXED_MII_1000_FDX
 + bool Emulation for 1000M Fdx fixed PHY behavior
 + depends on FIXED_PHY
 +
 +config FIXED_MII_AMNT
 +int Number of emulated PHYs to allocate 
 +depends on FIXED_PHY
 +default 1
 +---help---
 +Sometimes it is required to have several independent emulated
 +PHYs on the bus (in case of multi-eth but phy-less HW for instance).
 +This control will have specified number allocated for each fixed
 +PHY type enabled.

Shouldn't these be runtime options (ie: module parameters)?


 ...

 + *  Private information hoder for mii_bus

tpyo.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata [ata_piix] still no resume from S3 ?

2007-07-20 Thread Tejun Heo
Rúben Fonseca wrote:
 I wish I could disable this card reader. It is built in on the hardware,
 and there are no drivers for Linux. There is no option on the BIOS to
 disable the device. Is there any way (kernel parameters, magic program,
 etc) to disable this device without opening my laptop to cut the wires?
 :D

OIC.  How about not loading tifm_7xx1 module?  Does that make any
difference?

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    5   6   7   8   9   10   11   >