Re: 2.6.22-rc3 hibernate(?) fails totally - regression (xfs on raid6)

2007-06-06 Thread Tejun Heo
Hello,

David Greaves wrote:
> Just to be clear. This problem is where my system won't resume after s2d
> unless I umount my xfs over raid6 filesystem.

This is really weird.  I don't see how xfs mount can affect this at all.

[--snip--]
> So now this compiles but it does cause the problem:
> 
> umount /huge
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> # resumes fine
> 
> mount /huge
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> # won't resume

How hard does the machine freeze?  Can you use sysrq?  If so, please
dump sysrq-t.

>   Behavior difference introduced by the
>> reimplementation is serialization of resume sequence, so it takes more
>> time.  My test machine had problems resuming if resume took too long
>> even with the previous implementation.  It didn't matter whether the
>> long resuming sequence is caused by too many controllers or explicit
>> ssleep().  If time needed for resume sequence is over certain threshold,
>> machine hangs while resuming.  I thought it was a BIOS glitch and didn't
>> dig into it but you might be seeing the same issue.
> given the mount/umount thing this sounds unlikely... but what do I know?

No I don't think this is the same problem either.  The problem I
described happened during resume from s2ram.

> resume does throw up:
> ATA: abnormal status 0x7F on port 0x0001b007
> ATA: abnormal status 0x7F on port 0x0001b007
> ATA: abnormal status 0x7F on port 0x0001a407
> ATA: abnormal status 0x7F on port 0x0001a407
> 
> which I've not noticed before... oh, alright, I'll check...
> reboots to 2.6.21, suspend, resume...
> nope, not output on resume in 2.6.21

The messages don't really matter.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [PATCH] bug removing ehci-hcd

2007-06-06 Thread Satyam Sharma

Hi,

I remember this one ...

On 6/7/07, Greg KH <[EMAIL PROTECTED]> wrote:

On Thu, May 31, 2007 at 10:26:10AM -0500, [EMAIL PROTECTED] wrote:
>
> I wasn't actually able to reproduce the bug myself, but I guess it is
> pretty obvious that I shouldn't have called cpufreq_unregister_notifier
> with a spinlock held.  I haven't been doing this long enough to know
> exactly which kernel this patch should be against, so let me know if
> this ins't good.  Thanks!
>
>
> This patch (for the 2.6.21.3 kernel plus previously sent cpufreq
> notifier patch) fixes a bug caused by calling
> cpufreq_unregister_notifier (which can sleep) while holding a spinlock.
>
> Signed-off-by: Stuart Hayes <[EMAIL PROTECTED]>

Hm, this doesn't apply to the 2.6.21.3 kernel.


The cpufreq patches only live in -mm as of now ...


Can you send both patches merged together?

And is the fix already in Linus's tree?


Andrew seems to have already fixed this in the latest -mm
(in this very thread, funnily enough, looks like you missed it
as the subject change broke the threading :-)

[ There is a subtle difference, however, in that Andrew's
fix pushes the notifier unregistration /after/ the
spin_unlock_irq(>lock) critical section whereas Stuart
seems to be prefer doing it /before/ the corresponding
spin_lock_irq() ... ]

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-ia64 build warning messages

2007-06-06 Thread Peter Chubb
> "Russ" == Russ Anderson <[EMAIL PROTECTED]> writes:

Russ> Tony Luck wrote:
>> > I used the sn2_defconfig in the tree :)
>> 
>> So there is something odd happening.  Russ complained that he was
>> still seeing several errors from the sn2_defconfig build too when I
>> posted the "last fix" to Len.  But I don't see them when I build.

Russ> An additional data point.  I have a copy of Tony's test tree
Russ> pulled down on March 30th that builds without the warning
Russ> messages.  The copy of Tony's test tree pulled down on May 22nd
Russ> does have warning messages.  I'm building both with the same
Russ> compiler (etc).  I'm fairly certain a tree I pulled down in
Russ> April built without warnings.  I've since blown away that tree.

Change request 85bd2fddd68e757da8e1af98f857f61a3c9ce647 introduced
section-mismatch checking for vmlinux, which caused all these warnings
to become visible.

It looks as if gcc can create references from .sdata to .init.sdata
depending on what optimisations it chooses to do.  Ideally we could
teach gcc to put its constants in the same section they reference.
But I'm no gcc guru.  The alternative is to get modpost to ignore such
references, at the cost of perhaps missing a real problem somewhere.
--
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au   ERTOS within National ICT Australia
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch/rfc] implement memmem() locally in kallsyms.c

2007-06-06 Thread Mike Frysinger
This patch basically copies the gnulib version of memmem() into
scripts/kallsyms.c.  While a useful function, it isn't in POSIX so some
systems (like Darwin) choose to omit it.  How do others feel ?

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
---
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -26,8 +26,6 @@
  *
  */
 
-#define _GNU_SOURCE
-
 #include 
 #include 
 #include 
@@ -56,6 +54,37 @@ int token_profit[0x1];
 unsigned char best_table[256][2];
 unsigned char best_table_len[256];
 
+/* memmem(), while useful, is not in POSIX, so create a local version
+ * so we can compile on non-GNU systems (Darwin, *BSD, etc...)
+ */
+void *memmem(const void *haystack, size_t haystack_len,
+ const void *needle, size_t needle_len)
+{
+   const char *begin;
+   const char *const last_possible =
+   (const char *)haystack + haystack_len - needle_len;
+
+   /* The first occurrence of the empty string is deemed to occur at
+* the beginning of the string.
+*/
+   if (needle_len == 0)
+   return (void *)haystack;
+
+   /* Sanity check, otherwise the loop might search through the whole
+* memory.
+*/
+   if (haystack_len < needle_len)
+   return NULL;
+
+   for (begin = (const char *)haystack; begin <= last_possible; ++begin)
+   if (begin[0] == ((const char *)needle)[0] &&
+   !memcmp((const void *)[1],
+   (const void *)((const char *)needle + 1),
+   needle_len - 1))
+   return (void *)begin;
+
+   return NULL;
+}
 
 static void usage(void)
 {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH] KVM - Fix rmode_tss_base declaration

2007-06-06 Thread Avi Kivity
Jeff Dike wrote:
> On Thu, Jun 07, 2007 at 10:13:42AM +0800, Li, Xin B wrote:
>   
>>> -static int rmode_tss_base(struct kvm* kvm)
>>> +static unsigned long rmode_tss_base(struct kvm* kvm)
>>>   
>> Should use gpa_t instead.
>> 
>
> Right you are, I didn't notice that type.
>
>   

Some extra logic is needed on i386 with >= 4GB.  Current code will
wraparound since gfn_t is 32-bits long, but casting it to 64-bits is not
the answer since the processor will truncate it back to 32 bits (the
return value is eventually used as a long in enter_rmode()).


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.22-rc4-mm2

2007-06-06 Thread Andrew Morton

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm2/

- Basically a bugfixed version of 2.6.22-rc4-mm1.  None of the subsystem
  trees were repulled, several bad patches were dropped, a few were fixed.


Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git 
tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

echo "subscribe mm-commits" | mail [EMAIL PROTECTED]

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.



Changes since 2.6.22-rc4-mm1:

 git-acpi.patch
 git-alsa.patch
 git-arm-master.patch
 git-arm.patch
 git-avr32.patch
 git-cifs.patch
 git-cpufreq.patch
 git-drm.patch
 git-dvb.patch
 git-gfs2-nmw.patch
 git-hid.patch
 git-ieee1394.patch
 git-infiniband.patch
 git-input.patch
 git-kbuild.patch
 git-kvm.patch
 git-leds.patch
 git-libata-all.patch
 git-md-accel.patch
 git-mips.patch
 git-mmc.patch
 git-ubi.patch
 git-netdev-all.patch
 git-net.patch
 git-backlight.patch
 git-battery.patch
 git-ioat.patch
 git-nfs.patch
 git-nfs-server-cluster-locking-api.patch
 git-ocfs2.patch
 git-parisc.patch
 git-r8169.patch
 git-selinux.patch
 git-s390.patch
 git-sh.patch
 git-scsi-misc.patch
 git-scsi-rc-fixes.patch
 git-scsi-target.patch
 git-unionfs.patch
 git-watchdog.patch
 git-wireless.patch
 git-ipwireless_cs.patch
 git-newsetup.patch
 git-xfs.patch
 git-cryptodev.patch
 git-xtensa.patch
 git-gccbug.patch

 git trees

+char-stallion-dont-fail-with-less-than-max-panels.patch
+char-stallion-alloc-tty-before-pci-devices-init.patch
+char-stallion-proper-fail-return-values.patch
+frv-build-fix.patch
+uml-get-declaration-of-simple_strtoul.patch
+isdn-diva-fix-section-mismatch.patch

 2.6.22 queue

+git-acpi-disable-acpi_processor_throttling_seq_show.patch

 Attempt to stop acpi oopsing

+toshica_acpi-fix-section-mismatch-in-allyesconfig.patch

 section fix

+revert-gregkh-driver-block-device.patch

 Revert dud patch from driver tree

+mac80211-fix-1-bit-bitfield-to-unsigned.patch

 wireless sparse fix

+x86_64-acpi-disable-srat-when-numa-emulation-succeeds-fix.patch

 Fix x86_64-acpi-disable-srat-when-numa-emulation-succeeds.patch

-mmconfig-validate-against-acpi-motherboard-resources.patch

 Dropped due to compilation errors

+paravirt-helper-to-disable-all-io-space-fix-2.patch
+paravirt-helper-to-disable-all-io-space-fix-3.patch

 Fix paravirt-helper-to-disable-all-io-space-fix.patch

+sata_promise-use-tf-interface-for-polling-nodata-commands.patch

 SATA Promise fix

+serial-convert-early_uart-to-earlycon-for-8250-fix.patch

 Fix serial-convert-early_uart-to-earlycon-for-8250.patch for ia64

-slub-use-ilog2-instead-of-series-of-constant-comparisons.patch

 Dropped due to gcc-3.3.3 bustage

+mm-merge-nopfn-into-fault-spufs-fix.patch

 Fix mm-merge-nopfn-into-fault.patch compile

+pm-introduce-hibernation-and-suspend-notifiers-fix-fix.patch

 Fix pm-introduce-hibernation-and-suspend-notifiers.patch compile again

-define-new-percpu-interface-for-shared-data.patch
-use-the-new-percpu-interface-for-shared-data.patch

 Dropped because it caused hangs in Michal's testing

+undeprecate-raw-driver.patch
+hfsplus-change-kmalloc-memset-to-kzalloc.patch
+submitchecklist-update-fix-spelling-error.patch
+fix-typo-in-prefetchh.patch

 Misc fixes

+spi_mpc83xxc-underclocking-hotfix.patch

 Fix an SPI driver

-sane-irq-initialization-in-sedlbauer-hisax.patch
+sane-irq-initialization-in-sedlbauer-hisax.patch

 New, fixed version of this ISDN patch

+matroxfb-color-setting-fixes-fix.patch

 fbdev fix

+schedstats-fix-printk-format.patch

 printk fix

-arch-personality-independent-stack-top.patch
-audit-rework-execve-audit.patch
-audit-rework-execve-audit-fix.patch
-mm-move_page_tables_up.patch
-mm-variable-length-argument-support.patch
-mm-variable-length-argument-support-fix.patch

 

Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-06 Thread Albert Cahalan

On 6/6/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[EMAIL PROTECTED]> wrote:
> Eric W. Biederman writes:
> > Badari Pulavarty <[EMAIL PROTECTED]> writes:
>
> >> Your recent cleanup to shm code, namely
> >>
> >> [PATCH] shm: make sysv ipc shared memory use stacked files
> >>
> >> took away one of the debugging feature for shm segments.
> >> Originally, shmid were forced to be the inode numbers and
> >> they show up in /proc/pid/maps for the process which mapped
> >> this shared memory segments (vma listing). That way, its easy
> >> to find out who all mapped this shared memory segment. Your
> >> patchset, took away the inode# setting. So, we can't easily
> >> match the shmem segments to /proc/pid/maps easily. (It was
> >> really useful in tracking down a customer problem recently).
> >> Is this done deliberately ? Anything wrong in setting this back ?
> >
> > Theoretically it makes the stacked file concept more brittle,
> > because it means the lower layers can't care about their inode
> > number.
> >
> > We do need something to tie these things together.
> >
> > So I suspect what makes most sense is to simply rename the
> > dentry SYSVID
>
> Please stop breaking things in /proc. The pmap command relys
> on the old behavior.

What effect did this change have upon the pmap command?  Details, please.

> It's time to revert.

Probably true, but we'd need to understand what the impact was.


Very simply, pmap reports the shmid.

albert 0 ~$ pmap `pidof X` | egrep -2 shmid
3005  16384K rw-s-  /dev/fb0
3105152K rw---[ anon ]
31076000384K rw-s-[ shmid=0x3f428000 ]
310d6000384K rw-s-[ shmid=0x3f430001 ]
31136000384K rw-s-[ shmid=0x3f438002 ]
31196000384K rw-s-[ shmid=0x3f440003 ]
311f6000384K rw-s-[ shmid=0x3f448004 ]
31256000384K rw-s-[ shmid=0x3f450005 ]
312b6000384K rw-s-[ shmid=0x3f460006 ]
31316000384K rw-s-[ shmid=0x3f870007 ]
31491000140K r  /usr/share/fonts/type1/gsfonts/n021003l.pfb
3150e000   9496K rw---[ anon ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [PATCH] bug removing ehci-hcd

2007-06-06 Thread Greg KH
On Thu, May 31, 2007 at 10:26:10AM -0500, [EMAIL PROTECTED] wrote:
> 
> I wasn't actually able to reproduce the bug myself, but I guess it is
> pretty obvious that I shouldn't have called cpufreq_unregister_notifier
> with a spinlock held.  I haven't been doing this long enough to know
> exactly which kernel this patch should be against, so let me know if
> this ins't good.  Thanks!
> 
> 
> This patch (for the 2.6.21.3 kernel plus previously sent cpufreq
> notifier patch) fixes a bug caused by calling
> cpufreq_unregister_notifier (which can sleep) while holding a spinlock.
> 
> Signed-off-by: Stuart Hayes <[EMAIL PROTECTED]>

Hm, this doesn't apply to the 2.6.21.3 kernel.

Can you send both patches merged together?

And is the fix already in Linus's tree?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Add LZO1X compression support to the kernel

2007-06-06 Thread Nitin Gupta

On 6/6/07, Richard Purdie <[EMAIL PROTECTED]> wrote:



Nitin: Have you any objections to this version? If not, I'll finish
analysing the PTR_ code changes and then hopefully we can get something
into -mm...



Your code now looks nice and clean. But I don't know what you want. I
already spent lot of time on version 7 I posted and contains all those
corrections which were suggested for my earlier version. I cannot ask
you to look into possible problems (if any) in my code now since  you
are not interested in that anyway. So, please continue this
duplication.

Thanks,
Nitin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch 4/4] lock contention tracking slimmed down

2007-06-06 Thread hui
On Thu, Jun 07, 2007 at 02:17:45AM +0200, Martin Peschke wrote:
> Ingo Molnar wrote:
> >, quite some work went into it - NACK :-(
> 
> Considering the amount of code.. ;-)I am sorry.
> 
> But seriously, did you consider using some user space tool or script to
> format this stuff the way you like it - similar to the way the powertop tool
> reshuffles timer_stats data found in a proc file, for example?

When I was doing my stuff, I intended for it to be parsed by a script or
simple command line tools like sort/grep piped through less. I also though
it might be interesting to output the text into either a python or ruby
syntax collect so that it can go through a more extensive sorting using
those languages.

There are roughly about 400 locks in a normal kernel for a desktop. The
list is rather cumbersome anyways so, IMO, it really should be handled
by parsing tools, etc... There could be more properties attached to each
lock especially if you intend to get this to work on -rt which need more
things reported.

bill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kprobes x86_64 fix for mark ro data

2007-06-06 Thread S. P. Prasanna

This patch fixes the problem of page protection introduced by
CONFIG_DEBUG_RODATA for x86_64 architecture. As per Andi
Kleen's suggestion, the kernel text pages are marked writeable
only for a short duration to insert or remove the breakpoints.

Signed-off-by: Prasanna S P<[EMAIL PROTECTED]>
Ack-ed-by: Jim Keniston <[EMAIL PROTECTED]>


 arch/x86_64/kernel/kprobes.c |   26 ++
 arch/x86_64/mm/init.c|6 +-
 include/asm-x86_64/kprobes.h |   10 ++
 3 files changed, 41 insertions(+), 1 deletion(-)

diff -puN arch/x86_64/kernel/kprobes.c~kprobes-mark-ro-data-fix-x86_64 
arch/x86_64/kernel/kprobes.c
--- 
linux-2.6.22-rc2/arch/x86_64/kernel/kprobes.c~kprobes-mark-ro-data-fix-x86_64   
2007-06-07 09:20:33.0 +0530
+++ linux-2.6.22-rc2-prasanna/arch/x86_64/kernel/kprobes.c  2007-06-07 
09:20:33.0 +0530
@@ -209,16 +209,42 @@ static void __kprobes arch_copy_kprobe(s
 
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
+   unsigned long addr = (unsigned long)p->addr;
+   int page_readonly = 0;
+
+   if (kernel_readonly_text(addr)) {
+   change_page_attr_addr(addr, 1, PAGE_KERNEL_EXEC);
+   global_flush_tlb();
+   page_readonly = 1;
+   }
*p->addr = BREAKPOINT_INSTRUCTION;
flush_icache_range((unsigned long) p->addr,
   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+   if (page_readonly) {
+   change_page_attr_addr(addr, 1, PAGE_KERNEL_RO);
+   global_flush_tlb();
+   }
 }
 
 void __kprobes arch_disarm_kprobe(struct kprobe *p)
 {
+   unsigned long addr = (unsigned long)p->addr;
+   int page_readonly = 0;
+
+   if (kernel_readonly_text(addr)) {
+   change_page_attr_addr(addr, 1, PAGE_KERNEL_EXEC);
+   global_flush_tlb();
+   page_readonly = 1;
+   }
+
*p->addr = p->opcode;
flush_icache_range((unsigned long) p->addr,
   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+
+   if (page_readonly) {
+   change_page_attr_addr(addr, 1, PAGE_KERNEL_RO);
+   global_flush_tlb();
+   }
 }
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
diff -puN include/asm-x86_64/kprobes.h~kprobes-mark-ro-data-fix-x86_64 
include/asm-x86_64/kprobes.h
--- 
linux-2.6.22-rc2/include/asm-x86_64/kprobes.h~kprobes-mark-ro-data-fix-x86_64   
2007-06-07 09:20:33.0 +0530
+++ linux-2.6.22-rc2-prasanna/include/asm-x86_64/kprobes.h  2007-06-07 
09:20:33.0 +0530
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define  __ARCH_WANT_KPROBES_INSN_SLOT
 
@@ -88,4 +89,13 @@ extern int kprobe_handler(struct pt_regs
 
 extern int kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data);
+extern int kernel_text_is_ro;
+static inline int kernel_readonly_text(unsigned long address)
+{
+   if (kernel_text_is_ro && ((address >= (unsigned long)_stext)
+   && (address < (unsigned long) _etext)))
+   return 1;
+
+   return 0;
+}
 #endif /* _ASM_KPROBES_H */
diff -puN arch/x86_64/mm/init.c~kprobes-mark-ro-data-fix-x86_64 
arch/x86_64/mm/init.c
--- linux-2.6.22-rc2/arch/x86_64/mm/init.c~kprobes-mark-ro-data-fix-x86_64  
2007-06-07 09:20:33.0 +0530
+++ linux-2.6.22-rc2-prasanna/arch/x86_64/mm/init.c 2007-06-07 
09:20:33.0 +0530
@@ -48,6 +48,7 @@
 #define Dprintk(x...)
 #endif
 
+int kernel_text_is_ro;
 const struct dma_mapping_ops* dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
@@ -598,10 +599,13 @@ void mark_rodata_ro(void)
 {
unsigned long start = (unsigned long)_stext, end;
 
+   kernel_text_is_ro = 1;
 #ifdef CONFIG_HOTPLUG_CPU
/* It must still be possible to apply SMP alternatives. */
-   if (num_possible_cpus() > 1)
+   if (num_possible_cpus() > 1) {
start = (unsigned long)_etext;
+   kernel_text_is_ro = 0;
+   }
 #endif
end = (unsigned long)__end_rodata;
start = (start + PAGE_SIZE - 1) & PAGE_MASK;

_
-- 
Prasanna S.P.
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [EMAIL PROTECTED]
Ph: 91-80-41776329
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kprobes i386 fix for mark ro data

2007-06-06 Thread S. P. Prasanna
On Thu, Jun 07, 2007 at 11:12:32AM +1200, Ian McDonald wrote:
> On 6/7/07, Chuck Ebbert <[EMAIL PROTECTED]> wrote:
> >On 06/06/2007 04:47 PM, Ian McDonald wrote:
> >> Hi there,
> >>
> >> We've seen a report of a problem with dccp_probe as shown below. The
> >> user has also verified that it occurs in tcp_probe as well. This is on
> >> Dave Miller's tree but that currently tracks Linus' tree quite
> >> closely. I do note that it is around 2.6.22-rc2 timeframe so there is
> >> a possibility fixes may have gone in since.
> >>
> >
> >It faulted when it tried to write the breakpoint instruction into the
> >running kernel's executable code. Apparently the kernel code is now marked
> >read-only?
> >
> >
> Yes it would appear to be the case as user has CONFIG_DEBUG_RODATA
> set. Patrick - can you turn this off and retest? It's under Kernel
> Hacking, Write protect kernel read only data structures.
> 
> The list of commits that I see around this are at:
> http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git=search=HEAD=commit=DEBUG_RODATA
> 
> I suspect it's probably one of the latter ones giving the timing.
> 
> I guess there are a couple of solutions here - either make kprobes
> conflict with CONFIG_DEBUG_RODATA so you can do one or the other, or
> look into more detail what access kprobes need.
> 
> Ian

Ian,

Please find the fix as suggested by Andi Kleen 
for the above stated problem.

Thanks
Prasanna


This patch fixes the problem of page protection introduced by
CONFIG_DEBUG_RODATA. CONFIG_DEBUG_RODATA marks the text pages as
read-only, hence kprobes is unable to insert breakpoints in the
kernel text. This patch overrides the page protection when adding
or removing a probe for the i386 architecture.

Signed-off-by: Prasanna S P<[EMAIL PROTECTED]>
Ack-ed-by: Jim Keniston <[EMAIL PROTECTED]>



 arch/i386/kernel/kprobes.c |   26 ++
 arch/i386/mm/init.c|2 ++
 include/asm-i386/kprobes.h |   12 
 include/asm-i386/pgtable.h |2 ++
 4 files changed, 42 insertions(+)

diff -puN arch/i386/kernel/kprobes.c~kprobes-mark-ro-data-fix-i386 
arch/i386/kernel/kprobes.c
--- linux-2.6.22-rc2/arch/i386/kernel/kprobes.c~kprobes-mark-ro-data-fix-i386   
2007-06-07 09:19:26.0 +0530
+++ linux-2.6.22-rc2-prasanna/arch/i386/kernel/kprobes.c2007-06-07 
09:19:26.0 +0530
@@ -169,16 +169,42 @@ int __kprobes arch_prepare_kprobe(struct
 
 void __kprobes arch_arm_kprobe(struct kprobe *p)
 {
+   unsigned long addr = (unsigned long) p->addr;
+   int page_readonly = 0;
+
+   if (kernel_readonly_text(addr)) {
+   page_readonly = 1;
+   change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RWX);
+   global_flush_tlb();
+   }
+
*p->addr = BREAKPOINT_INSTRUCTION;
flush_icache_range((unsigned long) p->addr,
   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+
+   if (page_readonly) {
+   change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RX);
+   global_flush_tlb();
+   }
 }
 
 void __kprobes arch_disarm_kprobe(struct kprobe *p)
 {
+   unsigned long addr = (unsigned long) p->addr;
+   int page_readonly = 0;
+
+   if (kernel_readonly_text(addr)) {
+   page_readonly = 1;
+   change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RWX);
+   global_flush_tlb();
+   }
*p->addr = p->opcode;
flush_icache_range((unsigned long) p->addr,
   (unsigned long) p->addr + sizeof(kprobe_opcode_t));
+   if (page_readonly) {
+   change_page_attr(virt_to_page(addr), 1, PAGE_KERNEL_RX);
+   global_flush_tlb();
+   }
 }
 
 void __kprobes arch_remove_kprobe(struct kprobe *p)
diff -puN include/asm-i386/kprobes.h~kprobes-mark-ro-data-fix-i386 
include/asm-i386/kprobes.h
--- linux-2.6.22-rc2/include/asm-i386/kprobes.h~kprobes-mark-ro-data-fix-i386   
2007-06-07 09:19:26.0 +0530
+++ linux-2.6.22-rc2-prasanna/include/asm-i386/kprobes.h2007-06-07 
09:19:26.0 +0530
@@ -26,6 +26,8 @@
  */
 #include 
 #include 
+#include 
+#include 
 
 #define  __ARCH_WANT_KPROBES_INSN_SLOT
 
@@ -90,4 +92,14 @@ static inline void restore_interrupts(st
 
 extern int kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data);
+extern int kernel_text_is_ro;
+static inline int kernel_readonly_text(unsigned long address)
+{
+
+   if (kernel_text_is_ro && ((address >= PFN_ALIGN(_text))
+   && (address < PFN_ALIGN(_etext
+   return 1;
+
+   return 0;
+}
 #endif /* _ASM_KPROBES_H */
diff -puN include/asm-i386/pgtable.h~kprobes-mark-ro-data-fix-i386 
include/asm-i386/pgtable.h
--- linux-2.6.22-rc2/include/asm-i386/pgtable.h~kprobes-mark-ro-data-fix-i386   
2007-06-07 09:19:26.0 +0530
+++ 

Re: usb-scanner-cameras kernel-2.6.22 and udev-095 problem

2007-06-06 Thread Greg KH
On Wed, Jun 06, 2007 at 10:11:20PM -0500, [EMAIL PROTECTED] wrote:
>  greg
>  with CONFIG_USB_DEVICE_CLASS=y
>  scanner /dev/scanner- show up xsane is working now
> 
>  SCANNER PROBLEM SOLVED

Great, thanks for verifying this.  This config option is by default
enabled, so you need to work hard to disable it :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-06 Thread Andrew Morton
On Wed, 6 Jun 2007 23:27:01 -0400 "Albert Cahalan" <[EMAIL PROTECTED]> wrote:

> Eric W. Biederman writes:
> > Badari Pulavarty <[EMAIL PROTECTED]> writes:
> 
> >> Your recent cleanup to shm code, namely
> >>
> >> [PATCH] shm: make sysv ipc shared memory use stacked files
> >>
> >> took away one of the debugging feature for shm segments.
> >> Originally, shmid were forced to be the inode numbers and
> >> they show up in /proc/pid/maps for the process which mapped
> >> this shared memory segments (vma listing). That way, its easy
> >> to find out who all mapped this shared memory segment. Your
> >> patchset, took away the inode# setting. So, we can't easily
> >> match the shmem segments to /proc/pid/maps easily. (It was
> >> really useful in tracking down a customer problem recently).
> >> Is this done deliberately ? Anything wrong in setting this back ?
> >
> > Theoretically it makes the stacked file concept more brittle,
> > because it means the lower layers can't care about their inode
> > number.
> >
> > We do need something to tie these things together.
> >
> > So I suspect what makes most sense is to simply rename the
> > dentry SYSVID
> 
> Please stop breaking things in /proc. The pmap command relys
> on the old behavior.

What effect did this change have upon the pmap command?  Details, please.

> It's time to revert.

Probably true, but we'd need to understand what the impact was.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] enable interrupts in user path of page fault.

2007-06-06 Thread Steven Rostedt
This is a minor fix, but what is currently there is essentially wrong.
In do_page_fault, if the faulting address from user code happens to be
in kernel address space (int *p = (int*)-1; p = 0xbed;)  then the
do_page_fault handler will jump over the local_irq_enable with the

  goto bad_area_nosemaphore;

But the first line there sees this is user code and goes through the
process of sending a signal to send SIGSEGV to the user task. This whole
time interrupts are disabled and the task can not be preempted by a
higher priority task.

This patch always enables interrupts in the user path of the
bad_area_nosemaphore.

Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

diff --git a/arch/i386/mm/fault.c b/arch/i386/mm/fault.c
index 29d7d61..1ecb3e4 100644
--- a/arch/i386/mm/fault.c
+++ b/arch/i386/mm/fault.c
@@ -458,6 +458,11 @@ bad_area:
 bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & 4) {
+   /*
+* It's possible to have interrupts off here.
+*/
+   local_irq_enable();
+
/* 
 * Valid to do another page fault here because this one came 
 * from user space.
diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
index bfb62a1..635e58d 100644
--- a/arch/x86_64/mm/fault.c
+++ b/arch/x86_64/mm/fault.c
@@ -476,6 +476,12 @@ bad_area:
 bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & PF_USER) {
+
+   /*
+* It's possible to have interrupts off here.
+*/
+   local_irq_enable();
+
if (is_prefetch(regs, address, error_code))
return;
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-06 Thread Benjamin Herrenschmidt
On Wed, 2007-06-06 at 22:20 -0400, Jeff Dike wrote:
> On Thu, Jun 07, 2007 at 08:43:42AM +1000, Paul Mackerras wrote:
> > What Ben was talking about was stealing a synchronous SEGV from a task
> > without stopping it, and as Ben says that makes no sense.
> > Intercepting a signal and stopping the task is reasonable, and that is
> > what ptrace does, and I assume also UML.
> 
> It is, but I can also see UML stealing the SEGV from the child.  The
> UML skas does this - a ptrace extension, PTRACE_FAULTINFO, is used to
> extract page fault information from the child, and other pieces of the
> patch are used to fix the fault without the child continuing until
> it's fixed.  So, in this case, the child never sees the SEGV.

But you use ptrace and don't steal signals with dequeue_signal() on a
live other task, which is ok.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] /proc/pid/maps doesn't match "ipcs -m" shmid

2007-06-06 Thread Albert Cahalan

Eric W. Biederman writes:

Badari Pulavarty <[EMAIL PROTECTED]> writes:



Your recent cleanup to shm code, namely

[PATCH] shm: make sysv ipc shared memory use stacked files

took away one of the debugging feature for shm segments.
Originally, shmid were forced to be the inode numbers and
they show up in /proc/pid/maps for the process which mapped
this shared memory segments (vma listing). That way, its easy
to find out who all mapped this shared memory segment. Your
patchset, took away the inode# setting. So, we can't easily
match the shmem segments to /proc/pid/maps easily. (It was
really useful in tracking down a customer problem recently).
Is this done deliberately ? Anything wrong in setting this back ?


Theoretically it makes the stacked file concept more brittle,
because it means the lower layers can't care about their inode
number.

We do need something to tie these things together.

So I suspect what makes most sense is to simply rename the
dentry SYSVID


Please stop breaking things in /proc. The pmap command relys
on the old behavior. It's time to revert. Put back the segment ID
where it belongs, and leave the key where it belongs too.

Containers are NOT worth breaking our ABIs left and right.
We don't need to leap off that bridge just because Solaris did,
unless you can explain why complexity and bloat are desirable.
We already have SE Linux, chroot, KVM, and several more!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

2007-06-06 Thread Benjamin Herrenschmidt
On Wed, 2007-06-06 at 08:35 -0700, Linus Torvalds wrote:
> So now we should do "recalc_sigpending()" only when signals may be
> *added* 
> (where messing with the "blocked" mask obviously is a form of adding 
> signals, and possibly the most common reason for having to recalculate
> the 
> sigpending mask).
> 
> Comments? This patch is _entirely_ and utterly untested, so I'm only 
> saying that this "feels" safer and more correct to me.

Oh and we still need to at least do the if (tsk == current) thingy
for the DRM notifier hack...

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

2007-06-06 Thread Benjamin Herrenschmidt
On Wed, 2007-06-06 at 08:35 -0700, Linus Torvalds wrote:
> 
> So I think that the *right* place to clear TIF_SIGPENDING is actually in 
> "get_signal_to_deliver()", because that function is called _only_ by the 
> actual per-architecture "I'm going to deliver a signal now".

That was my initial idea but it has an issue with kernel
threads :-( There are in kernel thingies that use signals to a certain
extend and rely on being able to dequeue and/or clear sigpending (or
else they just loop instead of waiting in various loops).

A bit of a can of worms if you ask me.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

2007-06-06 Thread Benjamin Herrenschmidt
On Wed, 2007-06-06 at 03:59 -0700, Roland McGrath wrote:
> Oleg and I were just discussing this issue in relation to other problems.
> We established that it is never safe to clear TIF_SIGPENDING on another
> thread.  But I hadn't really thought through that it's sometimes not safe
> to clear your own TIF_SIGPENDING either.  That is, any time you are not
> positive you cannot be in a syscall that will return a -ERESTART* code.
> (I had the ptrace_stop case lurking in the back of my mind but hadn't
> considered how it would really come up.)

Ah, I missed that ptrace_stop() case indeed.

> I have a general recollection of thinking that dequeue_signal could only
> be called on current and that it mattered somehow.

I matter with that, and with the notifier thingy. It might matter for
other things as well, I don't know for sure yet though signalfd will
definitely cause it to be called for !current, though with my patch only
for shared signals.

>   But aside from
> avoiding recalc_sigpending, and kernel threads with notifier_mask set, I
> can't see off hand what it is.  I won't testify that I think signalfd is
> necessarily on safe ground, though.

Yeah, I'm a little worried too, the whole thing seems a bit fragile to
me.

I'm tempted to in fact make dequeue_signal() be the only one to every
clear TIF_SIGPENDING and only when tsk==current, but then, that does
mean there are a few cases, like when explicitely masking signals, where
we might end up with it spurriously set... thus causing spurrious -EINTR
returns for blocked signals.

I must admit at this point, it's becoming almost tempting to split it
into two different flags. TIF_SIGPENDING would then be the exact logic
"there is currently a non blocked signal pending" and could be cleared
at any time provided the siglock is help (which btw, is another source
of headaches if you grep ... people maniupate these things here or there
without the lock). And another one, TIF_SIGNALED, which would be set
always if TIG_SIGPENDING is set, and only ever cleared by
dequeue_signal, and causes the exception path to get into do_signal.

That would mean occasional spurrious trips in do_signal but that's
totally harmless and it wouldn't, I beleive, happen often enough to
impact performances significantly.

Cheers,
Ben.
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-06 Thread Benjamin Herrenschmidt
On Wed, 2007-06-06 at 08:52 -0400, Jeff Dike wrote:
> On Wed, Jun 06, 2007 at 12:50:04PM +1000, Benjamin Herrenschmidt wrote:
> > Yeah, synchronous signals should probably never be delivered to another
> > process, even via signalfd. There's no point delivering a SEGV to
> > somebody else :-)
> 
> Sure there is.  UML does exactly that - intercepting child signals
> (including SEGV) with wait.

UML is definitely what I call a special case :-) Now the question is how
do you get them ? Are you via some code path I haven't figured out
calling dequeue_signal() from another context ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] [PATCH] KVM - Fix rmode_tss_base declaration

2007-06-06 Thread Jeff Dike
On Thu, Jun 07, 2007 at 10:13:42AM +0800, Li, Xin B wrote:
> >-static int rmode_tss_base(struct kvm* kvm)
> >+static unsigned long rmode_tss_base(struct kvm* kvm)
> 
> Should use gpa_t instead.

Right you are, I didn't notice that type.

Will fix.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb-scanner-cameras kernel-2.6.22 and udev-095 problem

2007-06-06 Thread art

greg
with CONFIG_USB_DEVICE_CLASS=y
scanner /dev/scanner- show up xsane is working now

SCANNER PROBLEM SOLVED

thanx

xboom
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: signalfd API issues (was Re: [PATCH/RFC] signal races/bugs, losing TIF_SIGPENDING and other woes)

2007-06-06 Thread Jeff Dike
On Thu, Jun 07, 2007 at 08:43:42AM +1000, Paul Mackerras wrote:
> What Ben was talking about was stealing a synchronous SEGV from a task
> without stopping it, and as Ben says that makes no sense.
> Intercepting a signal and stopping the task is reasonable, and that is
> what ptrace does, and I assume also UML.

It is, but I can also see UML stealing the SEGV from the child.  The
UML skas does this - a ptrace extension, PTRACE_FAULTINFO, is used to
extract page fault information from the child, and other pieces of the
patch are used to fix the fault without the child continuing until
it's fixed.  So, in this case, the child never sees the SEGV.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb-scanner-cameras kernel-2.6.22 and udev-095 problem

2007-06-06 Thread art

greg this is part of my config
(we are talking now 2.6.22-rc4-200706042030-cfq7 #160 SMP PREEMPT Mon  
Jun 4 20:55:02 CDT 2007 x86_64 x86_64 x86_64 GNU/Linux)


CONFIG_USB_DEVICEFS=y
# CONFIG_USB_DEVICE_CLASS is not set -
# CONFIG_USB_DYNAMIC_MINORS is not set
CONFIG_USB_SUSPEND=y
# CONFIG_USB_OTG is not set

i disabled libusual in my config
# CONFIG_USB_LIBUSUAL is not set
with libusual=y my usb-harddrives doesn't show up on desktop after  
boot until i cycle one off/on and from this moment all of them start  
showing up
with libusual not set -usb-harddrives shows up but this doesn't affect  
scanners


btw scanners are working with standard fc7 86-x64 kernel

xboom
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc4-mm1

2007-06-06 Thread WANG Cong
On Wed, Jun 06, 2007 at 11:09:31AM -0700, Andrew Morton wrote:
>On Thu, 7 Jun 2007 00:19:36 +0800 WANG Cong <[EMAIL PROTECTED]> wrote:
>
>> On Wed, Jun 06, 2007 at 02:07:37AM -0700, Andrew Morton wrote:
>> >
>> >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm1/
>> >
>> >- Somebody broke it on my powerpc G5, but I didn't have time to do yet
>> >  another bisection yet.
>> >
>> 
>> It seems strange that a new C source file (mlguest.c) appears in the top dir 
>> of the 
>> kernel source. There are some problems with it.
>> 
>> First, I used `make mlguest.o` to compile that file, but I got tons of 
>> warnings and errors.
>> (Too many to put here.) What's wrong with it? Or I didn't compile/configure 
>> it correctly?
>> 
>> Second, mlguest.c #includes a head file named 
>> "../../include/linux/lguest_launcher.h".
>> Since mlguest.c is in the top dir, so where is 
>> ../../include/linux/lguest_launcher.h?
>> 
>
>Confused.  I've grepped the entire universe here for "mlguest" and came up
>with nothing.
>
>I don't have a clue where that file came from on your system.


I used 'ketchup' to update my kernel from -rc3 to -rc4-mm1. I got the follow:

[EMAIL PROTECTED] linux-2.6.22-rc4-mm1]$ ls
arch Documentation  ipc  MakefileREADME  System.map
blockdriversKbuild   mlguest.c   REPORTING-BUGS  usr
COPYING  fs kernel   mm  scripts vmlinux
CREDITS  includelib  Module.symvers  security
crypto   init   MAINTAINERS  net sound

Maybe there's something wrong with ketchup. ;(


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] MMCONFIG: validate against ACPI motherboard resources (revised)

2007-06-06 Thread Robert Hancock
This path adds validation of the MMCONFIG table against the ACPI reserved
motherboard resources. If the MMCONFIG table is found to be reserved in
ACPI, we don't bother checking the E820 table. The PCI Express firmware spec
apparently tells BIOS developers that reservation in ACPI is required and
E820 reservation is optional, so checking against ACPI first makes sense.
Many BIOSes don't reserve the MMCONFIG region in E820 even though it is
perfectly functional, the existing check needlessly disables MMCONFIG in
these cases.

In order to do this, MMCONFIG setup has been split into two phases. If PCI
configuration type 1 is not available then MMCONFIG is enabled early as before.
Otherwise, it is enabled later after the ACPI interpreter is enabled, since we
need to be able to execute control methods in order to check the ACPI reserved
resources. Presently this is just triggered off the end of ACPI interpreter
initialization.

There are a few other behavioral changes here:

-Validate all MMCONFIG configurations provided, not just the first one.

-Validate the entire required length of each configuration according to the
provided ending bus number is reserved, not just the minimum required 
allocation.

-Validate that the area is reserved even if we read it from the chipset directly
and not from the MCFG table. This catches the case where the BIOS didn't set the
location properly in the chipset and has mapped it over other things it 
shouldn't
have.

This also cleans up the MMCONFIG initialization functions so that they simply
do nothing if MMCONFIG is not compiled in.

Based on an original patch by Rajesh Shah from Intel.

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

---

This should fix up the compile error in the previous version with
CONFIG_PCI_MMCONFIG=n.

diff -rup --exclude-from=linux-2.6.22-rc4-mm1edit/Documentation/dontdiff 
linux-2.6.22-rc4-mm1/arch/i386/pci/init.c 
linux-2.6.22-rc4-mm1edit/arch/i386/pci/init.c
--- linux-2.6.22-rc4-mm1/arch/i386/pci/init.c   2007-06-06 18:49:08.0 
-0600
+++ linux-2.6.22-rc4-mm1edit/arch/i386/pci/init.c   2007-06-06 
19:03:16.0 -0600
@@ -1,4 +1,5 @@
 #include 
+#include 
 #include 
 #include "pci.h"
 
@@ -11,9 +12,7 @@ static __init int pci_access_init(void)
 #ifdef CONFIG_PCI_DIRECT
type = pci_direct_probe();
 #endif
-#ifdef CONFIG_PCI_MMCONFIG
-   pci_mmcfg_init(type);
-#endif
+   pci_mmcfg_early_init(type);
if (raw_pci_ops)
return 0;
 #ifdef CONFIG_PCI_BIOS
diff -rup --exclude-from=linux-2.6.22-rc4-mm1edit/Documentation/dontdiff 
linux-2.6.22-rc4-mm1/arch/i386/pci/mmconfig-shared.c 
linux-2.6.22-rc4-mm1edit/arch/i386/pci/mmconfig-shared.c
--- linux-2.6.22-rc4-mm1/arch/i386/pci/mmconfig-shared.c2007-06-06 
18:49:08.0 -0600
+++ linux-2.6.22-rc4-mm1edit/arch/i386/pci/mmconfig-shared.c2007-06-06 
18:50:46.0 -0600
@@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso
pci_mmcfg_resources_inserted = 1;
 }
 
-static void __init pci_mmcfg_reject_broken(int type)
+static acpi_status __init check_mcfg_resource(struct acpi_resource *res,
+ void *data)
+{
+   struct resource *mcfg_res = data;
+   struct acpi_resource_address64 address;
+   acpi_status status;
+
+   if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) {
+   struct acpi_resource_fixed_memory32 *fixmem32 =
+   >data.fixed_memory32;
+   if (!fixmem32)
+   return AE_OK;
+   if ((mcfg_res->start >= fixmem32->address) &&
+   (mcfg_res->end < (fixmem32->address +
+ fixmem32->address_length))) {
+   mcfg_res->flags = 1;
+   return AE_CTRL_TERMINATE;
+   }
+   }
+   if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) &&
+   (res->type != ACPI_RESOURCE_TYPE_ADDRESS64))
+   return AE_OK;
+
+   status = acpi_resource_to_address64(res, );
+   if (ACPI_FAILURE(status) ||
+  (address.address_length <= 0) ||
+  (address.resource_type != ACPI_MEMORY_RANGE))
+   return AE_OK;
+
+   if ((mcfg_res->start >= address.minimum) &&
+   (mcfg_res->end < (address.minimum + address.address_length))) {
+   mcfg_res->flags = 1;
+   return AE_CTRL_TERMINATE;
+   }
+   return AE_OK;
+}
+
+static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl,
+   void *context, void **rv)
+{
+   struct resource *mcfg_res = context;
+
+   acpi_walk_resources(handle, METHOD_NAME__CRS,
+   check_mcfg_resource, context);
+
+   if (mcfg_res->flags)
+   return AE_CTRL_TERMINATE;
+
+   return AE_OK;
+}
+
+static int __init is_acpi_reserved(unsigned long start, unsigned long end)
+{
+   struct resource mcfg_res;
+
+   mcfg_res.start 

RE: [kvm-devel] [PATCH] KVM - Fix rmode_tss_base declaration

2007-06-06 Thread Li, Xin B
>
>The long return value of rmode_tss_base is truncated by its declared
>return type of int.
>
>Signed-off-by: Jeff Dike <[EMAIL PROTECTED]>
>--
> drivers/kvm/vmx.c |2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>Index: kvm/drivers/kvm/vmx.c
>===
>--- kvm.orig/drivers/kvm/vmx.c
>+++ kvm/drivers/kvm/vmx.c
>@@ -884,7 +884,7 @@ static void enter_pmode(struct kvm_vcpu 
>   vmcs_write32(GUEST_CS_AR_BYTES, 0x9b);
> }
> 
>-static int rmode_tss_base(struct kvm* kvm)
>+static unsigned long rmode_tss_base(struct kvm* kvm)

Should use gpa_t instead.
-Xin 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] lguest suppress IDE probing

2007-06-06 Thread Rusty Russell
On Wed, 2007-06-06 at 11:23 +0100, Alan Cox wrote:
> > > Better yet just don't compile in the old IDE stuff, lguest doesn't have a
> > > PCI or ISA bus anyway.
> > 
> > Sure, but the "run the same kernel as guest and host" is a really nice
> > feature.
> 
> Modules dear boy, modules ;)

For some reason, pulling half the kernel's brains out into a separately
maintained userspace seems to make things less reliable.  I always build
in everything I need to boot.

Perhaps this makes me an old-timer.

> > > Alternatively make the IDE I/O space return 0xFF and it'll skip them
> > > anyway.
> > 
> > Hmm, every "in" should be returning 0xFFs, but I still get the delay and
> > the probing.  Xen domU gets it too.
> 
> Can you see in a debugger where it is spending the time. 0xFF should be
> taken as "no port, move on nothing to see"

Well, the code is a little opaque to me, but do_probe() calls msleep(50)
three times.  According to gdb this gets called 27 times -> 4.05
seconds.

Cheers,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 001 of 2] Fix read/truncate race.

2007-06-06 Thread NeilBrown

do_generic_mapping_read currently samples the i_size at the start
and doesn't do so again unless it needs to call ->readpage to load
a page.  After ->readpage it has to re-sample i_size as a truncate
may have caused that page to be filled with zeros, and the read()
call should not see these.

However there are other activities that might cause ->readpage to be
called on a page between the time that do_generic_mapping_read
samples i_size and when it finds that it has an uptodate page.  These
include at least read-ahead and possibly another thread performing a
read.

So do_generic_mapping_read must sample i_size *after* it has an
uptodate page.  Thus the current sampling at the start and after a read
can be replaced with a sampling before the copy-out.

The same change applied to __generic_file_splice_read.

Note that this fixes any race with truncate_complete_page, but does
not fix a possible race with truncate_partial_page.  If a partial
truncate happens after do_generic_mapping_read samples i_size and
before the copy_out, the nuls that truncate_partial_page place in the
page could be copied out incorrectly.

I think the best fix for that is to *not* zero out parts of the page
in truncate_partial_page, but rather to zero out the tail of a page
when increasing i_size.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
Cc: Jens Axboe <[EMAIL PROTECTED]>

Acked-by: Nick Piggin <[EMAIL PROTECTED]>
   (for the do_generic_mapping_read part)

### Diffstat output
 ./fs/splice.c  |   43 +-
 ./mm/filemap.c |   72 ++---
 2 files changed, 50 insertions(+), 65 deletions(-)

diff .prev/fs/splice.c ./fs/splice.c
--- .prev/fs/splice.c   2007-06-07 11:34:16.0 +1000
+++ ./fs/splice.c   2007-06-07 11:34:23.0 +1000
@@ -412,31 +412,32 @@ __generic_file_splice_read(struct file *
break;
}
 
-   /*
-* i_size must be checked after ->readpage().
-*/
-   isize = i_size_read(mapping->host);
-   end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
-   if (unlikely(!isize || index > end_index))
-   break;
+   }
+   fill_it:
+   /*
+* i_size must be checked after PageUptodate.
+*/
+   isize = i_size_read(mapping->host);
+   end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+   if (unlikely(!isize || index > end_index))
+   break;
 
+   /*
+* if this is the last page, see if we need to shrink
+* the length and stop
+*/
+   if (end_index == index) {
+   loff = PAGE_CACHE_SIZE - (isize & ~PAGE_CACHE_MASK);
+   if (total_len + loff > isize)
+   break;
/*
-* if this is the last page, see if we need to shrink
-* the length and stop
+* force quit after adding this page
 */
-   if (end_index == index) {
-   loff = PAGE_CACHE_SIZE - (isize & 
~PAGE_CACHE_MASK);
-   if (total_len + loff > isize)
-   break;
-   /*
-* force quit after adding this page
-*/
-   len = this_len;
-   this_len = min(this_len, loff);
-   loff = 0;
-   }
+   len = this_len;
+   this_len = min(this_len, loff);
+   loff = 0;
}
-fill_it:
+
partial[page_nr].offset = loff;
partial[page_nr].len = this_len;
len -= this_len;

diff .prev/mm/filemap.c ./mm/filemap.c
--- .prev/mm/filemap.c  2007-06-07 11:34:16.0 +1000
+++ ./mm/filemap.c  2007-06-07 11:34:24.0 +1000
@@ -871,13 +871,11 @@ void do_generic_mapping_read(struct addr
 {
struct inode *inode = mapping->host;
unsigned long index;
-   unsigned long end_index;
unsigned long offset;
unsigned long last_index;
unsigned long next_index;
unsigned long prev_index;
unsigned int prev_offset;
-   loff_t isize;
int error;
struct file_ra_state ra = *_ra;
 
@@ -888,27 +886,12 @@ void do_generic_mapping_read(struct addr
last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> 
PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
 
-   isize = i_size_read(inode);
-   if (!isize)
-   goto out;
-
-   end_index = (isize 

Re: [PATCH pata-2.6 fix] hpt366: disallow Ultra133 for HPT374

2007-06-06 Thread Andrew Morton
On Wed, 6 Jun 2007 23:53:28 +0400 Sergei Shtylyov <[EMAIL PROTECTED]> wrote:

> Eliminate UltraATA/133 support for HPT374 -- the chip isn't capable of this 
> mode
> according to the manual, and doesn't even seem to tolerate 66 MHz DPLL 
> clock...
> 
> Signed-off-by: Sergei Shtylyov <[EMAIL PROTECTED]>
> 
> ---
>  drivers/ide/pci/hpt366.c |8 
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6/drivers/ide/pci/hpt366.c
> ===
> --- linux-2.6.orig/drivers/ide/pci/hpt366.c
> +++ linux-2.6/drivers/ide/pci/hpt366.c
> @@ -1,5 +1,5 @@
>  /*
> - * linux/drivers/ide/pci/hpt366.cVersion 1.03May 4, 2007
> + * linux/drivers/ide/pci/hpt366.cVersion 1.04Jun 4, 2007

argh.  Please just delete this version numbering.  It's a sure-fire way of
maximising patch conflicts.

It's 1.10 in Bart's tree.

>   * Copyright (C) 1999-2003   Andre Hedrick <[EMAIL PROTECTED]>
>   * Portions Copyright (C) 2001   Sun Microsystems, Inc.
> @@ -106,7 +106,8 @@
>   *   switch  to calculating  PCI clock frequency based on the chip's base 
> DPLL
>   *   frequency
>   * - switch to using the  DPLL clock and enable UltraATA/133 mode by default 
> on
> - *   anything  newer than HPT370/A
> + *   anything  newer than HPT370/A (except HPT374 that is not capable of this
> + *   mode according to the manual)
>   * - fold PCI clock detection and DPLL setup code into init_chipset_hpt366(),
>   *   also fixing the interchanged 25/40 MHz PCI clock cases for HPT36x chips;
>   *   unify HPT36x/37x timing setup code and the speedproc handlers by joining
> @@ -365,7 +366,6 @@ static u32 sixty_six_base_hpt37x[] = {
>  };
>  
>  #define HPT366_DEBUG_DRIVE_INFO  0
> -#define HPT374_ALLOW_ATA133_61
>  #define HPT371_ALLOW_ATA133_61
>  #define HPT302_ALLOW_ATA133_61
>  #define HPT372_ALLOW_ATA133_61
> @@ -450,7 +450,7 @@ static struct hpt_info hpt370a __devinit
>  
>  static struct hpt_info hpt374 __devinitdata = {
>   .chip_type  = HPT374,
> - .max_mode   = HPT374_ALLOW_ATA133_6 ? 4 : 3,
> + .max_mode   = 3,
>   .dpll_clk   = 48,
>   .settings   = hpt37x_settings
>  };

The code in Bart's tree has

static struct hpt_info hpt374 __devinitdata = {
.chip_type  = HPT374,
.max_ultra  = HPT374_ALLOW_ATA133_6 ? 6 : 5,
.dpll_clk   = 48,
.settings   = hpt37x_settings
};

I can handle the renaming, but note that Bart's tree has different values
as well.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Rob Landley wrote:
> On Wednesday 06 June 2007 7:41 pm, H. Peter Anvin wrote:
>> This makes vmlinux (normally stripped) recoverable from the bzImage file
>> and so anything that is currently booting vmlinux would be serviced by
>> this scheme.
> 
> Would this make it sane to strip the initramfs image out of vmlinux with 
> objdump and replace it with another one, or are there offsets resolved during 
> the build that stop that for vmlinux?
> 

There probably are offsets resolved during the build.  However, that
wouldn't be all that hard to fix.  Still, one can argue whether or not
it is sane under any definition to do this kind of unpacking-repacking
of ELF files.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 000 of 2] Fix some bugs with 'read' racing with 'truncate'

2007-06-06 Thread NeilBrown
The following two patches fix a couple of bugs which trigger when read
races with truncate.

As there is no locking between read and truncate, we need to be
careful about sequencing.   In some cases were aren't careful enough.

The first patch ensures that we check i_size *after* gaining a
reference to an uptodate page, thus ensuring that we don't unknowingly
returns NUL characters that are beyond the end of the file.

The second ensures that we don't deliver partial reads to more than
one sub-buffer in a readv call.

These bugs have been around for a while and are quite subtle, so I
don't think there is any rush for them to go into 2.6.22.  Rather they
should sit in -mm at least until .23 opens up.

Thanks,
NeilBrown



 [PATCH 001 of 2] Fix read/truncate race.
 [PATCH 002 of 2] Make sure readv stops reading when it hits end-of-file.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 002 of 2] Make sure readv stops reading when it hits end-of-file.

2007-06-06 Thread NeilBrown

The do_loop_readv_writev implementation of readv breaks out of the
loop as soon as a single read request didn't fill it's buffer:
if (nr != len)
break;

The generic_file_aio_read version doesn't.  So if it hits EOF
before the end of the list of buffers, it will try again on the next
buffer.  If the file was extended in the mean time, this will
produce a bad result.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./mm/filemap.c |2 ++
 1 file changed, 2 insertions(+)

diff .prev/mm/filemap.c ./mm/filemap.c
--- .prev/mm/filemap.c  2007-06-07 11:34:24.0 +1000
+++ ./mm/filemap.c  2007-06-07 11:39:35.0 +1000
@@ -1206,6 +1206,8 @@ generic_file_aio_read(struct kiocb *iocb
retval = retval ?: desc.error;
break;
}
+   if (desc.count > 0)
+   break;
}
}
 out:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Rob Landley
On Wednesday 06 June 2007 7:41 pm, H. Peter Anvin wrote:
> This makes vmlinux (normally stripped) recoverable from the bzImage file
> and so anything that is currently booting vmlinux would be serviced by
> this scheme.

Would this make it sane to strip the initramfs image out of vmlinux with 
objdump and replace it with another one, or are there offsets resolved during 
the build that stop that for vmlinux?

Rob
-- 
The Google cluster became self-aware at 2:14am EDT August 29, 2007...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ACPI: Move timer broadcast and pmtimer access before C3 arbiter shutdown

2007-06-06 Thread Pallipadi, Venkatesh
 

>-Original Message-
>From: Andrew Morton [mailto:[EMAIL PROTECTED] 
>Sent: Wednesday, June 06, 2007 6:39 PM
>To: Thomas Gleixner
>Cc: Pallipadi, Venkatesh; Stable Team; LKML; Len Brown; Ingo 
>Molnar; Arjan van de Ven; Andi Kleen; Udo A. Steinberg; Dave Jones
>Subject: Re: [PATCH] ACPI: Move timer broadcast and pmtimer 
>access before C3 arbiter shutdown
>
>On Wed, 06 Jun 2007 11:37:53 +0200 Thomas Gleixner 
><[EMAIL PROTECTED]> wrote:
>
>> From: Udo A. Steinberg <[EMAIL PROTECTED]>
>> 
>> The chip set doc for IHC4 says:
>> 
>> 1.In general, software should not attempt any non-posted 
>accesses during
>> arbiter disable except to the ICH4's power management registers. This
>> implies that interrupt handlers for any unmasked hardware 
>interrupts and
>> SMI/NMI should check ARB_DIS status before reading from ICH devices.
>> 
>> So it's not a good idea to access ICH devices after arbiter 
>shut down. 
>> 
>> Signed-off-by: Udo A. Steinberg <[EMAIL PROTECTED]>
>> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
>> 
>> ---
>>  drivers/acpi/processor_idle.c |9 +
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>> 
>> Index: linux-2.6.22-rc4/drivers/acpi/processor_idle.c
>> ===
>> --- linux-2.6.22-rc4.orig/drivers/acpi/processor_idle.c  
>2007-06-06 11:47:21.0 +0200
>> +++ linux-2.6.22-rc4/drivers/acpi/processor_idle.c   
>2007-06-06 11:48:21.0 +0200
>> @@ -488,6 +488,11 @@ static void acpi_processor_idle(void)
>>  
>>  case ACPI_STATE_C3:
>>  
>> +/* Get start time (ticks) */
>> +t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>> +/* Handle timer broadcast before bus arbiter 
>shutdown ! */
>> +acpi_state_timer_broadcast(pr, cx, 1);
>> +
>>  if (pr->flags.bm_check) {
>>  if (atomic_inc_return(_cpu_count) ==
>>  num_online_cpus()) {
>> @@ -502,10 +507,7 @@ static void acpi_processor_idle(void)
>>  ACPI_FLUSH_CPU_CACHE();
>>  }
>>  
>> -/* Get start time (ticks) */
>> -t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>>  /* Invoke C3 */
>> -acpi_state_timer_broadcast(pr, cx, 1);
>>  acpi_cstate_enter(cx);
>>  /* Get end time (ticks) */
>>  t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>
>hm, this needs a bit of help to get it to work against Len's 
>current tree.
>
>However, if by "non-posted accesses" you're referring to that 
>inl(), how
>come the second one which was left in place isn't also a problem?
>

The doc says "except to the ICH4's power management registers".
It seems acpi_gbl_FADT.xpm_timer_block.address is actually OK in this
case
as it is ACPI PM timer register.
The problem we had is the access to HPET registers
inside acpi_state_timer_broadcast() and that is the one that has to be
done
before ARB_DIS.

Thanks,
Venki
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT] enable interrupts in do_page_fault for users (take 2 - add i386 fix too)

2007-06-06 Thread Steven Rostedt
Thomas,

Can you replace my previous patch with this one. This one includes the
fix for i386.

-- Steve

Index: linux-2.6.21-rt9/arch/x86_64/mm/fault.c
===
--- linux-2.6.21-rt9.orig/arch/x86_64/mm/fault.c
+++ linux-2.6.21-rt9/arch/x86_64/mm/fault.c
@@ -476,6 +476,10 @@ bad_area:
 bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & PF_USER) {
+
+   /* it's possible to have interrupts off here */
+   local_irq_enable();
+
if (is_prefetch(regs, address, error_code))
return;
 
Index: linux-2.6.21-rt9/arch/i386/mm/fault.c
===
--- linux-2.6.21-rt9.orig/arch/i386/mm/fault.c
+++ linux-2.6.21-rt9/arch/i386/mm/fault.c
@@ -459,6 +459,10 @@ bad_area:
 bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (error_code & 4) {
+
+   /* it's possible to have interrupts off here */
+   local_irq_enable();
+
/* 
 * Valid to do another page fault here because this one came 
 * from user space.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ACPI: Move timer broadcast and pmtimer access before C3 arbiter shutdown

2007-06-06 Thread Andrew Morton
On Wed, 06 Jun 2007 11:37:53 +0200 Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> From: Udo A. Steinberg <[EMAIL PROTECTED]>
> 
> The chip set doc for IHC4 says:
> 
> 1.In general, software should not attempt any non-posted accesses during
> arbiter disable except to the ICH4's power management registers. This
> implies that interrupt handlers for any unmasked hardware interrupts and
> SMI/NMI should check ARB_DIS status before reading from ICH devices.
> 
> So it's not a good idea to access ICH devices after arbiter shut down. 
> 
> Signed-off-by: Udo A. Steinberg <[EMAIL PROTECTED]>
> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
> 
> ---
>  drivers/acpi/processor_idle.c |9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6.22-rc4/drivers/acpi/processor_idle.c
> ===
> --- linux-2.6.22-rc4.orig/drivers/acpi/processor_idle.c   2007-06-06 
> 11:47:21.0 +0200
> +++ linux-2.6.22-rc4/drivers/acpi/processor_idle.c2007-06-06 
> 11:48:21.0 +0200
> @@ -488,6 +488,11 @@ static void acpi_processor_idle(void)
>  
>   case ACPI_STATE_C3:
>  
> + /* Get start time (ticks) */
> + t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
> + /* Handle timer broadcast before bus arbiter shutdown ! */
> + acpi_state_timer_broadcast(pr, cx, 1);
> +
>   if (pr->flags.bm_check) {
>   if (atomic_inc_return(_cpu_count) ==
>   num_online_cpus()) {
> @@ -502,10 +507,7 @@ static void acpi_processor_idle(void)
>   ACPI_FLUSH_CPU_CACHE();
>   }
>  
> - /* Get start time (ticks) */
> - t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>   /* Invoke C3 */
> - acpi_state_timer_broadcast(pr, cx, 1);
>   acpi_cstate_enter(cx);
>   /* Get end time (ticks) */
>   t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);

hm, this needs a bit of help to get it to work against Len's current tree.

However, if by "non-posted accesses" you're referring to that inl(), how
come the second one which was left in place isn't also a problem?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: floppy.c soft lockup

2007-06-06 Thread Matt Mackall
On Wed, Jun 06, 2007 at 10:28:28AM -0700, Andrew Morton wrote:
> On Wed, 06 Jun 2007 09:12:04 -0400 Mark Hounschell <[EMAIL PROTECTED]> wrote:
> 
> > > 
> > > As far as a 100% CPU bound task being a valid thing to do, it has been 
> > > done for many years on SMP machines. Any kernel limitation on this 
> > > surely must be considered a bug? 
> > > 
> > 
> > Could someone authoritatively comment on this? Is a SCHED_RR/SCHED_FIFO
> > 100% Cpu bound process supported in an SMP env on Linux? (vanilla or -rt)
> 
> It will kill the kernel, sorry.
> 
> The only way in which we can fix that is to allow kernel threads to preempt
> rt-priority userspace threads.  But if we were to do that (to benefit the
> few) it would cause _all_ people's rt-prio processes to experience glitches
> due to kernel activity, which we believe to be worse.
> 
> So we're between a rock and a hard place here.
> 
> If we really did want to solve this then I guess the kernel would need some
> new code to detect a 100%-busy rt-prio process and to then start premitting
> preemption of it for kernel thread activity.  That detector would need to
> be smart enough to detect a number of 100%-busy rt-prio processes which are
> yielding to each other, and one rt-prio process which keeps forking others,
> etc.  It might get tricky.

The usual alternative is to manually chrt the relevant kernel threads
to RT priority and adjust the priority scheme of their processes appropriately.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init_task & Co.

2007-06-06 Thread Paul Mundt
On Wed, Jun 06, 2007 at 05:26:49PM -0700, Davide Libenzi wrote:
> I'm sure there's a good reason behind, but why are those variables 
> replicated in every architecture?
> Those are global variables, defined in global include files, and AFAICS 
> they could be moved in a single kernel/init_task.c file. No?
> 
Except they aren't all the same. At the very least, most architectures
don't agree on the linker section, some don't agree on alignment, and
others don't agree on initialization. Most people set the initial stack
from assembly on entry, others do it differently.

Having said that, there likely are some things that could be made
generic, it's basically just init_task and init_thread_union where there
are disagreements. The rest could likely be stashed in
kernel/init_task.c (although you'd have to carefully examine the
alignment for each architecture, particularly the ones that have special
__asm__ labels).

Perhaps having everyone use the same linker section for init_task and
moving that to asm-generic/vmlinux.lds.h is a reasonable first step.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc4-mm1

2007-06-06 Thread Andrew Morton
On Wed, 6 Jun 2007 17:32:33 -0700 "Paul Menage" <[EMAIL PROTECTED]> wrote:

> On 6/6/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:
> >
> > (1) build for i386 with my .config
> > (2) attempt to boot in qemu's i386 system simulator
> >
> > I'm not seeing the sort of nondeterminism Andy Whitcroft is. It breaks
> > every time when I try this.
> >
> 
> Looks to be lockdep related - it's reproducible for me when I turn on
> CONFIG_LOCKDEP and the early crash goes away when I move the
> container_init_early() call to after lockdep_init().
> 

ooh, yes, lockdep_init() really does want to be called before anything
else.

So do we take it that this code hasn't been tested with lockdep?  Please
don't forget that step - lockdep finds some pretty nasty bugs sometimes.

This?

--- a/init/main.c~containersv10-basic-container-framework-fix-2
+++ a/init/main.c
@@ -503,7 +503,6 @@ asmlinkage void __init start_kernel(void
char * command_line;
extern struct kernel_param __start___param[], __stop___param[];
 
-   container_init_early();
smp_setup_processor_id();
 
/*
@@ -512,6 +511,7 @@ asmlinkage void __init start_kernel(void
 */
unwind_init();
lockdep_init();
+   container_init_early();
 
local_irq_disable();
early_boot_irqs_off();
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> It doesn't if we simply declare that a certain chunk of memory is
> available to it, for the case where it runs in the native configuration.
> Since it doesn't have to support *any* ELF file, just the kernel one,
> that's an option.
>   

I suppose.  But given that its always built at the same time as - and
linked to - the kernel itself, it can have private knowledge about the
kernel.

> On the other hand, I guess with the decompressor/ELF parser being PIC,
> one would simply look for the highest used address, and relocate itself
> above that point.  It's not really all that different from what the
> decompressor does today, except that it knows the address a priori.
>   

Yes, it would have to decompress the ELF file into a temp buffer, and
then rearrange itself and the decompressed ELF file to make space for
the ELF file's final location.  Seems a bit more complex because it has
to be done in the middle of execution rather that at start of day.  But
perhaps that doesn't matter very much.

>> I was thinking of making the ELF file entirely descriptive, since its
>> just a set of ELF headers inserted into the existing bzImage structure,
>> and it still relies on the bzImage being build properly in the first place.
>> 
>
> Again, it's an option.  The downside is that you don't get the automatic
> test coverage of having it be exercised as often as possible.

I don't follow your argument at all.

I'm proposing the kernel take the same code path regardless of how its
booted, with the only two variations:

   1. boot all the way up from 16-bit mode, or
   2. start directly in 32-bit mode

which is essentially the current situation (setup vs code32_start).  All
I'm adding is a bit more metadata for the domain builder to work with. 
The code will get exercised on every boot in every environment, and the
metadata will be tested by whichever environment cares about it.

You're proposing that we add a third booting variation, where the
bootloader takes on the responsibility for decompressing and loading the
kernel's ELF image.  In addition, you're proposing changing the existing
32-bit portion of the boot to perform the same job as the third method,
but in a way which is not reusable by a paravirtual domain builder. 
This means that the boot path is unique for each boot environment, and
so will overall get less coverage.

Given that one axis of the test matrix - "number of subarchtectures" -
is the same in both cases, and the other axis - "number of ways of
booting" - is larger in your proposal, it seems to me that your's has
the higher testing burden.

Anyway, I added an extra pointer in the boot_params so that you can
implement it that way if you really want (no real reason you can have
ELF within ELF within bzImage, but it starts to look a bit
engineering-by-compromise at that point).  It isn't, however, the
approach I want to take with Xen.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21] cramfs: add cramfs Linear XIP

2007-06-06 Thread Justin Treon
--- Jared Hulbert <[EMAIL PROTECTED]> wrote:
> The vma->flags = 1875 = 0x753
> 
> This is:
> VM_READ
> VM_WRITE
> VM_MAYREAD
> VM_MAYEXEC
> VM_GROWSDOWN
> VM_GROWSUP
> VM_PFNMAP
> 
There was a mistake in Jared's previous post in this
thread. The vm_flags were already in hex, i.e. 0x1875

The settings were:
VM_READ
VM_EXEC
VM_MAYREAD
VM_MAYWRITE
VM_MAYEXEC
VM_DENYWRITE
VM_EXECUTABLE

A possible problem source is that VM_PFNMAP is not set. 
Thus when vm_normal_page is called there is no associated
struct page.  

Justin Treon




  

Park yourself in front of a world of choices in alternative vehicles. Visit the 
Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/ 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


incorrect tracking of /proc/*/exe for overwritten running processes

2007-06-06 Thread Mike Frysinger

looking at a simple program:
int main()
{
 if (fork()) return 0;
 printf("pid = %i\n", getpid());
 while (1) sleep(3600);
}

and where my / and /var/tmp are on the same partition:

# gcc test.c -o /usr/sbin/MOO
# /usr/sbin/MOO
pid = 17144
# readlink /proc/17144/exe
/usr/sbin/MOO

# gcc test.c -o /var/tmp/MOO
# mv /var/tmp/MOO /usr/sbin/MOO
# readlink /proc/17144/exe
/var/tmp/MOO (deleted)

i feel like the new exe link should actually read:
/usr/sbin/MOO (deleted)
otherwise people can easily get confused as they think their daemon
was started in /var/tmp/ and their machine was compromised

# uname -a
Linux vapier 2.6.21.3 #4 SMP PREEMPT Sat Jun 2 09:55:10 EDT 2007
x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD
GNU/Linux
-mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: USB low-speed bulk transfers

2007-06-06 Thread Andrew Morton
On Wed, 6 Jun 2007 23:20:46 +0200 "Steinar H. Gunderson" <[EMAIL PROTECTED]> 
wrote:

> [Please Cc on reply]
> 
> Hi,
> 
> I recently bought an USB MIDI interface from ESI (called “ESI MIDI Mate”). It
> claims to work with Linux, but doesn't -- I've already asked the manufacturer
> for an explanation, but as I was impatient, I hacked a bit on the drivers to
> actually make it work...
> 
> The /proc/bus/usb/devices entry looks like this:
> 
>   T:  Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  4 Spd=1.5 MxCh= 0
>   D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
>   P:  Vendor=0a92 ProdID=1001 Rev= 1.04
>   S:  Manufacturer=ESI
>   S:  Product=ESI MIDI Mate
>   C:* #Ifs= 2 Cfg#= 1 Atr=80 MxPwr= 20mA
>   I:* If#= 0 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=00 Driver=snd-usb-audio
>   I:* If#= 1 Alt= 0 #EPs= 2 Cls=01(audio) Sub=03 Prot=00 Driver=snd-usb-audio
>   E:  Ad=81(I) Atr=02(Bulk) MxPS=   4 Ivl=0ms
>   E:  Ad=01(O) Atr=02(Bulk) MxPS=   4 Ivl=0ms
> 
> There are two points worth noting here:
> 
>  - The device is USB low speed. snd-usb-audio simply checks for full/high
>speed, and refuses any device that isn't. I can hack around this,
>inverting a few checks etc., and it seems to work reasonably well
>(probably since the device has no PCM parts).
>  - Both endpoint descriptors are bulk. The HCD driver plain refuses bulk
>transfers for low-speed; it looks like they are disallowed in the USB
>standard somehow. If I comment out the check, the driver works
>(perfectly!), but I guess this isn't acceptable for upstream?
> 
> Could the check for low-speed bulk transfers be replaced by a kernel warning
> somehow? I can't see any big harm by allowing them, and obviously, Windows XP
> and Mac OS X does so.
> 
> I can supply a patch for the snd-usb-audio specific parts if desired, but I
> can't guarantee it's the correct fix for cards that support PCM. Not that I
> know of any PCM-capable low-speed USB sound cards out there...
> 

(added linux-usb-devel to cc)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 00/10] Containers(V10): Generic Process Containers

2007-06-06 Thread Paul Jackson
> I suppose as a cleaner alternative we could 
> add a container_subsys->inherit_defaults() handler, to be called at
> container_clone(), and for cpusets this would set cpus and mems to
> the parent values - sibling exclusive values.  If that comes to nothing,
> then the attach_task() is still refused, and the unshare() or clone()
> fails, but this time with good reason.

Unfortunately, I haven't spent the time I should thinking about
container cloning, namespaces and such.

I don't know, for the workloads that matter to me, when, how or
if this container cloning will be used.

I'm tempted to suggest the following.

First, I am assuming that the classic method of creating cpuset
children will still work, such as the following (which can fail
for certain combinations of exclusive cpus or mems):
cd /dev/cpuset/foobar
mkdir foochild
cp cpus foochild
cp mems foochild
echo $$ > foochild/tasks

Second, given that, how about you fail the unshare() or clone()
anytime that the instance to be cloned has any sibling cpusets
with any exclusive flags set.

The exclusive property is not really on friendly terms with cloning.

Now if the above classic code must be encoded using cloning under
the covers, then we've got problems, probably more problems than
just this.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> 
> Certainly, but much harder to implement.  The ELF parser needs to be
> prepared to move itself around to get out of the way of the ELF file. 
> It's a fairly large change from how it works now.
> 

It doesn't if we simply declare that a certain chunk of memory is
available to it, for the case where it runs in the native configuration.
Since it doesn't have to support *any* ELF file, just the kernel one,
that's an option.

On the other hand, I guess with the decompressor/ELF parser being PIC,
one would simply look for the highest used address, and relocate itself
above that point.  It's not really all that different from what the
decompressor does today, except that it knows the address a priori.

> I was thinking of making the ELF file entirely descriptive, since its
> just a set of ELF headers inserted into the existing bzImage structure,
> and it still relies on the bzImage being build properly in the first place.

Again, it's an option.  The downside is that you don't get the automatic
test coverage of having it be exercised as often as possible.

-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Audit: Add TTY input auditing

2007-06-06 Thread Andrew Morton
On Wed, 06 Jun 2007 12:10:28 +0200 Miloslav Trmac <[EMAIL PROTECTED]> wrote:

> From: Miloslav Trmac <[EMAIL PROTECTED]>
> 
> Add TTY input auditing, used to audit system administrator's actions.
> TTY input auditing works on a higher level than auditing all system
> calls within the session, which would produce an overwhelming amount of
> mostly useless audit events.
> 
> Add an "audit_tty" attribute, inherited across fork ().  Data read from
> TTYs by process with the attribute is sent to the audit subsystem by the
> kernel.  The audit netlink interface is extended to allow modifying the
> audit_tty attribute, and to allow sending explanatory audit events from
> user-space (for example, a shell might send an event containing the
> final command, after the interactive command-line editing and history
> expansion is performed, which might be difficult to decipher from the
> TTY input alone).
> 
> Because the "audit_tty" attribute is inherited across fork (), it would
> be set e.g. for sshd restarted within an audited session.  To prevent
> this, the audit_tty attribute is cleared when a process with no open TTY
> file descriptors (e.g. after daemon startup) opens a TTY.
> 
> See https://www.redhat.com/archives/linux-audit/2007-June/msg0.html
> for a more detailed rationale document for an older version of this patch.
> 
> ...
>
> +static void
> +tty_audit_buf_free(struct tty_audit_buf *buf)
> +{

The usual kernel style is

static void tty_audit_buf_free(struct tty_audit_buf *buf)
{

and the style which you've used here is usually only employed if its use
prevents an 80-column overflow.

There are plenty of exceptions to this, and I understand (and actually
agree with) the reason for the style which you've chosen, but
standardisation wins out.

The patch adds a lot of new code to n_tty.c, I suspect it would be neater
to put it all into a new file if possible?

> +/**
> + *   tty_audit_exit  -   Handle a task exit
> + *
> + *   Make sure all buffered data is written out and deallocate the buffer.
> + *   Only needs to be called if current->signal->tty_audit_buf != %NULL.
> + */
> +void
> +tty_audit_exit(void)
> +{
> + struct tty_audit_buf *buf;
> +
> + spin_lock(>sighand->siglock);

I think you have a bug here.  ->siglock is taken elsewhere in an irq-safe
fashion (multiple instances)

> +/**
> + *   tty_audit_add_data  -   Add data for TTY auditing.
> + *
> + *   Audit @data of @size from @tty, if necessary.
> + */
> +static void
> +tty_audit_add_data(struct tty_struct *tty, unsigned char *data, size_t size)
> +{
> + struct tty_audit_buf *buf;
> + int major, minor;
> +
> + if (unlikely(size == 0))
> + return;
> +
> + buf = tty_audit_buf_get(tty);
> + if (!buf)
> + return;
> +
> + mutex_lock(>mutex);
> + major = tty->driver->major;
> + minor = tty->driver->minor_start + tty->index;
> + if (buf->major != major || buf->minor != minor
> + || buf->icanon != tty->icanon) {
> + tty_audit_buf_push_current(buf);
> + buf->major = major;
> + buf->minor = minor;
> + buf->icanon = tty->icanon;
> + }
> + do {
> +   size_t run;
> +
> +   run = N_TTY_BUF_SIZE - buf->valid;
> +   if (run > size)
> + run = size;
> +   memcpy(buf->data + buf->valid, data, run);
> +   buf->valid += run;
> +   data += run;
> +   size -= run;
> +   if (buf->valid == N_TTY_BUF_SIZE)
> +   tty_audit_buf_push_current(buf);
> + } while (size != 0);

the whitespace went bad here.

> + mutex_unlock(>mutex);
> + tty_audit_buf_put(buf);
> +}
> +
>
> ...
>
> +
> +/* For checking whether a file is a TTY */
> +extern ssize_t tty_read(struct file * file, char __user * buf, size_t count,
> + loff_t *ppos);

Nope, please don't add extern declarations to C files.  Do it via header
files.

> +/**
> + *   tty_audit_opening   -   A TTY is being opened.
> + *
> + *   As a special hack, tasks that close all their TTYs and open new ones
> + *   are assumed to be system daemons (e.g. getty) and auditing is
> + *   automatically disabled for them.
> + */
> +void
> +tty_audit_opening(void)
> +{
> + int disable;
> +
> + disable = 1;
> + spin_lock(>sighand->siglock);
> + if (current->signal->audit_tty == 0)
> + disable = 0;
> + spin_unlock(>sighand->siglock);
> + if (!disable)
> + return;
> +
> + task_lock(current);
> + if (current->files) {
> + struct fdtable *fdt;
> + unsigned i;
> +
> + /*
> +  * We don't take a ref to the file, so we must hold ->file_lock
> +  * instead.
> +  */
> + spin_lock(>files->file_lock);

So we make file_lock nest inside task_lock().  Was that lock ranking
already being used elsewhere in the kernel, or is it a new association?

Has this code had full coverage testing 

[PATCH] [x86] remove CONFIG_X86_TSC

2007-06-06 Thread H. Peter Anvin
CONFIG_X86_TSC makes the TSC mandatory, but since the TSC may be
unstable, we still have to be able to operate without it.
Furthermore, with CONFIG_X86_GENERIC we still compile in the RDTSC
instructions.

In the end, the only significant effect is has is that it makes the
"notsc" flag inoperable, which is silly if we have the code to run
without TSC anyway.

Thus, remove this flag and all its conditionals.

Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
---
 arch/i386/Kconfig.cpu   |5 -
 arch/i386/defconfig |1 -
 arch/i386/kernel/cpu/bugs.c |8 
 arch/i386/kernel/tsc.c  |9 -
 arch/um/defconfig   |1 -
 arch/x86_64/Kconfig |4 
 arch/x86_64/defconfig   |1 -
 include/asm-i386/tsc.h  |4 
 8 files changed, 0 insertions(+), 33 deletions(-)

diff --git a/arch/i386/Kconfig.cpu b/arch/i386/Kconfig.cpu
index d7f6fb0..14d7ee8 100644
--- a/arch/i386/Kconfig.cpu
+++ b/arch/i386/Kconfig.cpu
@@ -332,11 +332,6 @@ config X86_OOSTORE
depends on (MWINCHIP3D || MWINCHIP2 || MWINCHIPC6) && MTRR
default y
 
-config X86_TSC
-   bool
-   depends on (MWINCHIP3D || MWINCHIP2 || MCRUSOE || MEFFICEON || 
MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII 
|| M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || 
MGEODE_LX || MCORE2) && !X86_NUMAQ
-   default y
-
 # this should be set for all -march=.. options where the compiler
 # generates cmov.
 config X86_CMOV
diff --git a/arch/i386/defconfig b/arch/i386/defconfig
index 1a3a221..0c78cae 100644
--- a/arch/i386/defconfig
+++ b/arch/i386/defconfig
@@ -170,7 +170,6 @@ CONFIG_X86_CMPXCHG64=y
 CONFIG_X86_GOOD_APIC=y
 CONFIG_X86_INTEL_USERCOPY=y
 CONFIG_X86_USE_PPRO_CHECKSUM=y
-CONFIG_X86_TSC=y
 CONFIG_X86_CMOV=y
 CONFIG_X86_MINIMUM_CPU_FAMILY=4
 CONFIG_HPET_TIMER=y
diff --git a/arch/i386/kernel/cpu/bugs.c b/arch/i386/kernel/cpu/bugs.c
index 54428a2..665bcd4 100644
--- a/arch/i386/kernel/cpu/bugs.c
+++ b/arch/i386/kernel/cpu/bugs.c
@@ -151,14 +151,6 @@ static void __init check_config(void)
 #endif
 
 /*
- * If we configured ourselves for a TSC, we'd better have one!
- */
-#ifdef CONFIG_X86_TSC
-   if (!cpu_has_tsc && !tsc_disable)
-   panic("Kernel compiled for Pentium+, requires TSC feature!");
-#endif
-
-/*
  * If we were told we had a good local APIC, check for buggy Pentia,
  * i.e. all B steppings and the C2 stepping of P54C when using their
  * integrated APIC (see 11AP erratum in "Pentium Processor
diff --git a/arch/i386/kernel/tsc.c b/arch/i386/kernel/tsc.c
index f64b81f..fdad18c 100644
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@@ -29,14 +29,6 @@ unsigned int tsc_khz;
 
 int tsc_disable;
 
-#ifdef CONFIG_X86_TSC
-static int __init tsc_setup(char *str)
-{
-   printk(KERN_WARNING "notsc: Kernel compiled with CONFIG_X86_TSC, "
-   "cannot disable TSC.\n");
-   return 1;
-}
-#else
 /*
  * disable flag for tsc. Takes effect by clearing the TSC cpu flag
  * in cpu/common.c
@@ -47,7 +39,6 @@ static int __init tsc_setup(char *str)
 
return 1;
 }
-#endif
 
 __setup("notsc", tsc_setup);
 
diff --git a/arch/um/defconfig b/arch/um/defconfig
index a54d0ef..eeb627c 100644
--- a/arch/um/defconfig
+++ b/arch/um/defconfig
@@ -55,7 +55,6 @@ CONFIG_X86_POPAD_OK=y
 CONFIG_X86_CMPXCHG64=y
 CONFIG_X86_GOOD_APIC=y
 CONFIG_X86_USE_PPRO_CHECKSUM=y
-CONFIG_X86_TSC=y
 CONFIG_UML_X86=y
 # CONFIG_64BIT is not set
 CONFIG_SEMAPHORE_SLEEPERS=y
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 5ce9443..bf2227a 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -201,10 +201,6 @@ config X86_INTERNODE_CACHE_BYTES
default "4096" if X86_VSMP
default X86_L1_CACHE_BYTES if !X86_VSMP
 
-config X86_TSC
-   bool
-   default y
-
 config X86_GOOD_APIC
bool
default y
diff --git a/arch/x86_64/defconfig b/arch/x86_64/defconfig
index 40178e5..20d972d 100644
--- a/arch/x86_64/defconfig
+++ b/arch/x86_64/defconfig
@@ -129,7 +129,6 @@ CONFIG_GENERIC_CPU=y
 CONFIG_X86_L1_CACHE_BYTES=128
 CONFIG_X86_L1_CACHE_SHIFT=7
 CONFIG_X86_INTERNODE_CACHE_BYTES=128
-CONFIG_X86_TSC=y
 CONFIG_X86_GOOD_APIC=y
 # CONFIG_MICROCODE is not set
 CONFIG_X86_MSR=y
diff --git a/include/asm-i386/tsc.h b/include/asm-i386/tsc.h
index 62c091f..36629ac 100644
--- a/include/asm-i386/tsc.h
+++ b/include/asm-i386/tsc.h
@@ -20,14 +20,10 @@ static inline cycles_t get_cycles(void)
 {
unsigned long long ret = 0;
 
-#ifndef CONFIG_X86_TSC
if (!cpu_has_tsc)
return 0;
-#endif
 
-#if defined(CONFIG_X86_GENERIC) || defined(CONFIG_X86_TSC)
rdtscll(ret);
-#endif
return ret;
 }
 
-- 
1.5.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[PATCH 2.6.22-rc4] USB: add support for TRU-install (C) and new VID/PIDs to Sierra Wireless driver (ATTEMPT 2)

2007-06-06 Thread Kevin Lloyd

From: Kevin Lloyd <[EMAIL PROTECTED]>

This patch is derived from the 2.6.22-rc4 kernel source and adds support for the
new TRU-install (C) feature (without this support new devices will not work), 
and
add new UMTS device VID/PIDs.

There was a previous submission on Tuesday June 5th, it was pointed out that it 
was
targeted for an outdated kernel, please use this one instead.

Thanks

Signed-off-by: Kevin Lloyd <[EMAIL PROTECTED]>

---

diff -uprN linux-2.6.22-rc4/drivers/usb/serial/sierra.c 
linux-2.6.22-rc4-swoc/drivers/usb/serial/sierra.c
--- linux-2.6.22-rc4/drivers/usb/serial/sierra.c2007-06-06 
10:53:20.0 -0700
+++ linux-2.6.22-rc4-swoc/drivers/usb/serial/sierra.c   2007-06-06 
11:14:26.0 -0700
@@ -1,6 +1,8 @@
/*
  USB Driver for Sierra Wireless

+  Targeted for 2.6.22 kernel
+
  Copyright (C) 2006  Kevin Lloyd <[EMAIL PROTECTED]>

  IMPORTANT DISCLAIMER: This driver is not commercially supported by
@@ -15,9 +17,9 @@

*/

-#define DRIVER_VERSION "v.1.0.6"
+#define DRIVER_VERSION "v.1.2.3"
#define DRIVER_AUTHOR "Kevin Lloyd <[EMAIL PROTECTED]>"
-#define DRIVER_DESC "USB Driver for Sierra Wireless USB modems"
+#define DRIVER_DESC "USB Driver for Sierra Wireless modems"

#include 
#include 
@@ -28,66 +30,53 @@
#include 
#include 

+#include "sierra.h"
+
+static int debug;

static struct usb_device_id id_table [] = {
{ USB_DEVICE(0x1199, 0x0017) }, /* Sierra Wireless EM5625 */
{ USB_DEVICE(0x1199, 0x0018) }, /* Sierra Wireless MC5720 */
+   { USB_DEVICE(0x0f30, 0x1b1d) }, /* Sierra Wireless MC5720 */
{ USB_DEVICE(0x1199, 0x0218) }, /* Sierra Wireless MC5720 */
{ USB_DEVICE(0x1199, 0x0020) }, /* Sierra Wireless MC5725 */
{ USB_DEVICE(0x1199, 0x0019) }, /* Sierra Wireless AirCard 595 */
-   { USB_DEVICE(0x1199, 0x0120) }, /* Sierra Wireless AirCard 595U */
{ USB_DEVICE(0x1199, 0x0021) }, /* Sierra Wireless AirCard 597E */
+   { USB_DEVICE(0x1199, 0x0120) }, /* Sierra Wireless USB Dongle 595U*/
+   
{ USB_DEVICE(0x1199, 0x6802) }, /* Sierra Wireless MC8755 */
{ USB_DEVICE(0x1199, 0x6804) }, /* Sierra Wireless MC8755 */
{ USB_DEVICE(0x1199, 0x6803) }, /* Sierra Wireless MC8765 */
-   { USB_DEVICE(0x1199, 0x6812) }, /* Sierra Wireless MC8775 */
+   { USB_DEVICE(0x1199, 0x6812) }, /* Sierra Wireless MC8775 & AC 875U */
{ USB_DEVICE(0x1199, 0x6820) }, /* Sierra Wireless AirCard 875 */
-
-   { USB_DEVICE(0x1199, 0x0112) }, /* Sierra Wireless AirCard 580 */
-   { USB_DEVICE(0x0F3D, 0x0112) }, /* AirPrime/Sierra PC 5220 */
+   { USB_DEVICE(0x1199, 0x6832) }, /* Sierra Wireless MC8780*/
+   { USB_DEVICE(0x1199, 0x6833) }, /* Sierra Wireless MC8781*/
+   { USB_DEVICE(0x1199, 0x6850) }, /* Sierra Wireless AirCard 880 */
+   { USB_DEVICE(0x1199, 0x6851) }, /* Sierra Wireless AirCard 881 */
+   { USB_DEVICE(0x1199, 0x6852) }, /* Sierra Wireless AirCard 880 E */
+   { USB_DEVICE(0x1199, 0x6853) }, /* Sierra Wireless AirCard 881 E */
+
+   { USB_DEVICE(0x1199, 0x0112), .driver_info = DEVICE_1_PORT }, /* Sierra 
Wireless AirCard 580 */
+   { USB_DEVICE(0x0F3D, 0x0112), .driver_info = DEVICE_1_PORT }, /* 
Airprime/Sierra PC 5220 */
+   
+	 
+	{ USB_DEVICE(0x1199, 0x0FFF), .driver_info = DEVICE_INSTALLER},	

{ }
};
MODULE_DEVICE_TABLE(usb, id_table);

-static struct usb_device_id id_table_1port [] = {
-   { USB_DEVICE(0x1199, 0x0112) }, /* Sierra Wireless AirCard 580 */
-   { USB_DEVICE(0x0F3D, 0x0112) }, /* AirPrime/Sierra PC 5220 */
-   { }
-};
-
-static struct usb_device_id id_table_3port [] = {
-   { USB_DEVICE(0x1199, 0x0017) }, /* Sierra Wireless EM5625 */
-   { USB_DEVICE(0x1199, 0x0018) }, /* Sierra Wireless MC5720 */
-   { USB_DEVICE(0x1199, 0x0218) }, /* Sierra Wireless MC5720 */
-   { USB_DEVICE(0x1199, 0x0020) }, /* Sierra Wireless MC5725 */
-   { USB_DEVICE(0x1199, 0x0019) }, /* Sierra Wireless AirCard 595 */
-   { USB_DEVICE(0x1199, 0x0120) }, /* Sierra Wireless AirCard 595U */
-   { USB_DEVICE(0x1199, 0x0021) }, /* Sierra Wireless AirCard 597E */
-   { USB_DEVICE(0x1199, 0x6802) }, /* Sierra Wireless MC8755 */
-   { USB_DEVICE(0x1199, 0x6804) }, /* Sierra Wireless MC8755 */
-   { USB_DEVICE(0x1199, 0x6803) }, /* Sierra Wireless MC8765 */
-   { USB_DEVICE(0x1199, 0x6812) }, /* Sierra Wireless MC8775 */
-   { USB_DEVICE(0x1199, 0x6820) }, /* Sierra Wireless AirCard 875 */
-   { }
-};
+int sierra_probe(struct usb_interface *iface, const struct usb_device_id *id);
+int sierra_set_power_state(struct usb_device *udev, unsigned long swiState);
+int sierra_set_ms_mode(struct usb_device *udev, SWIMS_SET_MODE_VALUE eSocMode);

static struct usb_driver sierra_driver = {
.name   = "sierra",
-   .probe  = usb_serial_probe,
+   .probe  = sierra_probe,
.disconnect = usb_serial_disconnect,
.id_table   = id_table,
.no_dynamic_id =  

Re: [PATCH 2.6.22-rc4] ehea: Fixed possible kernel panic on VLAN packet recv

2007-06-06 Thread Michael Ellerman
On Wed, 2007-06-06 at 20:53 +0200, Thomas Klein wrote:
> This patch fixes a possible kernel panic due to not checking the vlan group
> when processing received VLAN packets and a malfunction in VLAN/hypervisor
> registration.
> 
> 
> Signed-off-by: Thomas Klein <[EMAIL PROTECTED]>
> ---
> 
> 
> diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea.h 
> patched_kernel/drivers/net/ehea/ehea.h
> --- linux-2.6.22-rc4/drivers/net/ehea/ehea.h  2007-06-05 02:57:25.0 
> +0200
> +++ patched_kernel/drivers/net/ehea/ehea.h2007-06-06 12:53:58.0 
> +0200
> @@ -39,7 +39,7 @@
>  #include 
>  
>  #define DRV_NAME "ehea"
> -#define DRV_VERSION  "EHEA_0061"
> +#define DRV_VERSION  "EHEA_0064"
>  
>  #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
>   | NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
> diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c 
> patched_kernel/drivers/net/ehea/ehea_main.c
> --- linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c 2007-06-05 
> 02:57:25.0 +0200
> +++ patched_kernel/drivers/net/ehea/ehea_main.c   2007-06-06 
> 12:53:58.0 +0200
> @@ -1947,7 +1945,7 @@ static void ehea_vlan_rx_add_vid(struct 
>   }
>  
>   index = (vid / 64);
> - cb1->vlan_filter[index] |= ((u64)(1 << (vid & 0x3F)));
> + cb1->vlan_filter[index] |= ((u64)(0x8000 >> (vid & 0x3F)));
>  
>   hret = ehea_h_modify_ehea_port(adapter->handle, port->logical_port_id,
>  H_PORT_CB1, H_PORT_CB1_ALL, cb1);
> @@ -1982,7 +1980,7 @@ static void ehea_vlan_rx_kill_vid(struct
>   }
>  
>   index = (vid / 64);
> - cb1->vlan_filter[index] &= ~((u64)(1 << (vid & 0x3F)));
> + cb1->vlan_filter[index] &= ~((u64)(0x8000 >> (vid & 0x3F)));

These two seem ripe for splitting into some sort of helper routine.
Which would leave only one place to get it right.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread Davide Libenzi
On Thu, 7 Jun 2007, Arnd Bergmann wrote:

> On Thursday 07 June 2007, Davide Libenzi wrote:
> > The sys_socketcall() system call has been also changed to support
> > a new SYS_SOCKET2 indentifier.
> 
> I thought the general agreement was that sys_socketcall is a bad
> idea to start with. Is there any benefit in adding new calls to
> it instead of using a new system call number for sys_socket2 on
> all architectures?

Ohh, I didn't know it was flagged as "bad" ;) I actually had it that way, 
but then I noticed there was no __NR_socket, so I complied to the 
existing way of doing it.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch 1/4] statistics: no include hell for users

2007-06-06 Thread Martin Peschke

Dave Hansen wrote:

On Wed, 2007-06-06 at 23:33 +0200, Martin Peschke wrote:

 struct statistic_interface {
 /* private: */
struct list_head list;
-   struct dentry   *debugfs_dir;
-   struct dentry   *data_file;
-   struct dentry   *def_file;
+   void*debugfs_dir;
+   void*data_file;
+   void*def_file; 


If you don't actually dereference the pointer, you should just be able
to declare:

struct dentry;

and be done with it, right?  You don't _need_ the includes to have just
pointers.


Ah, looks like an established trick in kernel include files.
I guess I can revert the other, seq_file related change then as well.
Thank you. Will change my local copy.

Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc4-mm1

2007-06-06 Thread Paul Menage

On 6/6/07, William Lee Irwin III <[EMAIL PROTECTED]> wrote:


(1) build for i386 with my .config
(2) attempt to boot in qemu's i386 system simulator

I'm not seeing the sort of nondeterminism Andy Whitcroft is. It breaks
every time when I try this.



Looks to be lockdep related - it's reproducible for me when I turn on
CONFIG_LOCKDEP and the early crash goes away when I move the
container_init_early() call to after lockdep_init().

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread Arnd Bergmann
On Thursday 07 June 2007, Davide Libenzi wrote:
> The sys_socketcall() system call has been also changed to support
> a new SYS_SOCKET2 indentifier.

I thought the general agreement was that sys_socketcall is a bad
idea to start with. Is there any benefit in adding new calls to
it instead of using a new system call number for sys_socket2 on
all architectures?

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] sysrq-m oops

2007-06-06 Thread john stultz
Hey All,
With 2.6.21 and the current -git, we're seeing the following oops when
we try sysrq-m:

...
Node 1 Normal: 85*4kB 34*8kB 20*16kB 4*32kB 3*64kB 0*128kB 1*256kB
0*512kB 1*1024kB 0*2048kB 953*4096kB =
3906020kB 
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap  = 2040212kB
Total swap = 2040212kB
Unable to handle kernel paging request at 0348 RIP: 
 [] show_mem+0xdd/0x1e0
PGD 2052e9067 PUD 2052ea067 PMD 0 
Oops:  [1] PREEMPT SMP 
CPU 3 
Pid: 0, comm: swapper Not tainted 2.6.22-rc3-git7john #6
RIP: 0010:[]  [] show_mem+0xdd/0x1e0
RSP: 0018:810211f8bdf8  EFLAGS: 00010006
RAX: 0078 RBX: 000f RCX: 0348
RDX: 0348 RSI: 81000210a0f0 RDI: 00112000
RBP: 000f R08:  R09: 
R10:  R11:  R12: 81012189
R13: 7179 R14:  R15: 
FS:  2ba3b8bdbf40() GS:810211f629c0()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0348 CR3: 0002052e8000 CR4: 06e0
Process swapper (pid: 0, threadinfo 8101118be000, task
810211f70080)
Stack:   81a90460 0001
006d
 810110bf6000 0006 0002 812e0046
 81011584fbc0 820cc590 0061 006d
Call Trace:
   [] __handle_sysrq+0x86/0x140
 [] receive_chars+0x27c/0x300
 [] hrtimer_wakeup+0x0/0x30
 [] clocksource_get_next+0x47/0x60
 [] serial8250_interrupt+0x142/0x160
 [] handle_IRQ_event+0x34/0x70
 [] handle_edge_irq+0xca/0x150
 [] do_IRQ+0xbd/0x1b0
 [] default_idle+0x0/0x40
 [] ret_from_intr+0x0/0xa
   [] unix_poll+0x0/0xb0
 [] default_idle+0x29/0x40
 [] cpu_idle+0x6f/0xe0


Code: 8b 02 f6 c4 04 75 92 8b 02 66 85 c0 79 7a 48 83 c5 01 49 83 
RIP  [] show_mem+0xdd/0x1e0
 RSP 
CR2: 0348
Kernel panic - not syncing: Aiee, killing interrupt handler!



I'll keep digging but I wanted to see if anyone had any quick thoughts
or suggestions.

thanks
-john


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


init_task & Co.

2007-06-06 Thread Davide Libenzi

I'm sure there's a good reason behind, but why are those variables 
replicated in every architecture?
Those are global variables, defined in global include files, and AFAICS 
they could be moved in a single kernel/init_task.c file. No?



- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CIFS should honour umask

2007-06-06 Thread Steve French

Thanks - it looks almost right but you missed mknod case and your
patch had some whitespace/formatting problems.

Could you try the following and make sure it works for you?  If so will merge.

diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index f085db9..8e86aac 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -208,7 +208,8 @@ cifs_create(struct inode *inode, struct
   /* If Open reported that we actually created a file
   then we now have to set the mode if possible */
   if ((cifs_sb->tcon->ses->capabilities & CAP_UNIX) &&
-   (oplock & CIFS_CREATE_ACTION))
+   (oplock & CIFS_CREATE_ACTION)) {
+   mode &= ~current->fs->umask;
   if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) {
   CIFSSMBUnixSetPerms(xid, pTcon, full_path, mode,
   (__u64)current->fsuid,
@@ -226,7 +227,7 @@ cifs_create(struct inode *inode, struct
   cifs_sb->mnt_cifs_flags &
   CIFS_MOUNT_MAP_SPECIAL_CHR);
   }
-   else {
+   } else {
   /* BB implement mode setting via Windows security
  descriptors e.g. */
   /* CIFSSMBWinSetPerms(xid,pTcon,path,mode,-1,-1,nls);*/
@@ -336,6 +337,7 @@ int cifs_mknod(struct inode *inode, stru
   if (full_path == NULL)
   rc = -ENOMEM;
   else if (pTcon->ses->capabilities & CAP_UNIX) {
+   mode &= ~current->fs->umask;
   if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) {
   rc = CIFSSMBUnixSetPerms(xid, pTcon, full_path,
   mode, (__u64)current->fsuid,
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 3e87dad..f0ff12b 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -986,7 +986,8 @@ mkdir_get_info:
 * failed to get it from the server or was set bogus */
   if ((direntry->d_inode) && (direntry->d_inode->i_nlink < 2))
   direntry->d_inode->i_nlink = 2;
-   if (cifs_sb->tcon->ses->capabilities & CAP_UNIX)
+   if (cifs_sb->tcon->ses->capabilities & CAP_UNIX) {
+   mode &= ~current->fs->umask;
   if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SET_UID) {
   CIFSSMBUnixSetPerms(xid, pTcon, full_path,
   mode,
@@ -1004,7 +1005,7 @@ mkdir_get_info:
   cifs_sb->mnt_cifs_flags &
   CIFS_MOUNT_MAP_SPECIAL_CHR);
   }
-   else {
+   } else {
   /* BB to be implemented via Windows secrty descriptors
  eg CIFSSMBWinSetPerms(xid, pTcon, full_path, mode,
-1, -1, local_nls); */


On 6/6/07, Matt Keenan <[EMAIL PROTECTED]> wrote:

This patch makes CIFS honour a process' umask like other filesystems.
Of course the server is still free to munge the permissions if it wants
to; but the client will send the "right" permissions to begin with.

A few caveats;

1) It only applies to filesystems that have CAP_UNIX (aka support unix
extensions)
2) It applies the correct mode to the follow up CIFSSMBUnixSetPerms()
after remote creation (I can write a new patch if you want with the
"right" mode at actual creation time; however the "right" perms will
still need to be given to the follow up CIFSSMBUnixSetPerms() anyway).
3) It will probably work best with Samba 3.0.25a or newer (ie with this
patch applied
http://lists.samba.org/archive/linux-cifs-client/2007-January/001697.html)
4) It has been compiled, and tested on 2.6.22-rc4 / Samba 3.0.25a
(Ubuntu Dapper with a few custom backports), and with a bit of testing
seems to work just fine. (it also incidentally side steps bugs in
thunderbird and openoffice (the apps don't check the permissions on
files they create, they assume they will open() the way that have asked
them to be created xref open(O_WRONLY|O_CREAT) => valid fd then
mmap(fd,PROT_READ) => EFAULT).

I am going to give this patch a more thorough test tomorrow with ltp.
Comments, corrections, et al are welcome.


Matt

--
Matt Keenan
OpCode Solutions



Signed-off-by: Matt Keenan <[EMAIL PROTECTED]>

--
diff -urN linux-2.6.22-rc4/fs/cifs/dir.c 
linux-2.6.22-rc4.cifs-umask-fix/fs/cifs/dir.c
--- linux-2.6.22-rc4/fs/cifs/dir.c  2007-06-06 08:34:03.0 +0100
+++ linux-2.6.22-rc4.cifs-umask-fix/fs/cifs/dir.c   2007-06-06 
19:27:00.0 +0100
@@ -206,7 +206,11 @@
/* If Open reported that we actually created a file
then we now have to set the mode if possible */
if 

Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> I was thinking prescriptive, having the decompressor read the output
> stream and interpret it as ELF.  I guess a descriptive approach could be
> made to work, too (I haven't really thought about that avenue of
> approach), but the prescriptive model seems more powerful, at least to me.

Certainly, but much harder to implement.  The ELF parser needs to be
prepared to move itself around to get out of the way of the ELF file. 
It's a fairly large change from how it works now.

I was thinking of making the ELF file entirely descriptive, since its
just a set of ELF headers inserted into the existing bzImage structure,
and it still relies on the bzImage being build properly in the first place.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch 4/4] lock contention tracking slimmed down

2007-06-06 Thread Martin Peschke

Ingo Molnar wrote:

* Martin Peschke <[EMAIL PROTECTED]> wrote:


- lock_time_inc() vs. statistic_add_util()


please fix the coding style in lib/statistic.c. It's full of:

{
unsigned long long i;
if (value <= stat->u.histogram.range_min)
return 0;

put a newline after variable sections.

and:

on_each_cpu(_statistic_barrier, NULL, 0, 1);
return 0;

preferably use a newline before 'return' statements as well. (this is 
not always necessary, but in the above case it looks better)


Will do (in my local tree, for the time being).

Good points. Thnaks for reviewing.


Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch 4/4] lock contention tracking slimmed down

2007-06-06 Thread Martin Peschke

Ingo Molnar wrote:

* Martin Peschke <[EMAIL PROTECTED]> wrote:

The output has changed from a terribly wide table to an enormously 
long list (just the generic way the statistics code prints data). 


Sigh, why dont you _ask_ before doing such stuff?


A nice diffstat is always worth a try, isn't it?
And I see other reasons for code sharing.
Ah, and doing it has been actually quite simple once I had figured out
what the original code does. :-)


It is a terribly wide table because that makes it easily greppable


If one looks for contentions of "xtime_lock" within my enormously long list,
they could issue:

   grep -e "xtime_lock contentions" data

and get

   xtime_lock contentions 0x17bd2 3327 account_ticks+0x96/0x184
   xtime_lock contentions other 0

for example.

So how is this worse?


but still watchable in one chunk in a sufficiently zoomed out xterm.


I am wondering whether we really want to reject kernel patches on the basis of 
this reasoning, disregarding any other point why a patch might be helpful.


> Please preserve this output format

I understand why everybody likes their format most. It's always made to measure.
Chosing a different - or common - format didn't happen in bad faith.

Made to measure file format doesn't work well once we start abstracting out this 
functionality. And I feel that was expected too much of a low level kernel ABI 
piece.


I would like to add that usuability doesn't necessarily suffer if statistics for 
some brand new gadget look somewhat familiar and similar to other statistics one 
 has encountered earlier.



, quite some work went into it - NACK :-(


Considering the amount of code.. ;-)I am sorry.

But seriously, did you consider using some user space tool or script to
format this stuff the way you like it - similar to the way the powertop tool
reshuffles timer_stats data found in a proc file, for example?

The format of an enormously long list has been thought out keeping this 
particular requirement in mind.



Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Re: 4Gb ram not showing up

2007-06-06 Thread Robert Hancock

Andrew Lyon wrote:

Could this also cause a system to be unstable? my abit athlon64 at
work will not run x64 with more than 1gb ram, and i have a colo server
with supermicro & 2 x dual core xeons that will not run with more than
2gb.

Both systems have long uptimes but if i add ram they crash within
minutes of booting.

Tried several kernels up to 2.6.21 and gave up, I can send dmesg
output but the crashes are completely random.


That is likely some other problem, like bad RAM or memory timing issues.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> 
> I'm not sure I fully understand the mechanism you're proposing.  You
> have the 16-bit setup code, the 32-bit decompressor, and an ELF.gz. Once
> the decompressor has extracted the actual ELF file, are you proposing
> that it properly parse the ELF file and follow its instuctions to put
> the segments in the appropriate places, or are you assuming that the
> decompressor can just skip that part and plonk the ELF file where it wants?
> 
> In other words, do you see the Phdrs as being descriptive or prescriptive?
> 

I was thinking prescriptive, having the decompressor read the output
stream and interpret it as ELF.  I guess a descriptive approach could be
made to work, too (I haven't really thought about that avenue of
approach), but the prescriptive model seems more powerful, at least to me.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] Containers(V10): Generic Process Containers

2007-06-06 Thread Serge E. Hallyn
Quoting Paul Jackson ([EMAIL PROTECTED]):
> > > I wasn't paying close enough attention to understand why you couldn't
> > > do it in two steps - make the container, and then populate it with
> > > resources.
> > 
> > Sorry, please clarify - are you saying that now you do understand, or
> > that I should explain?
> 
> Could you explain -- I still don't understand why you need this option.
> I still don't understand why you can't do it in two steps - make the
> container, then add cpu/mem separately.

Sure - the key is that the ns subsystem uses container_clone() to
automatically create a new container (on sys_unshare() or clone(2)
with certain flags) and move the current task into it.  Let's say
we have done

mount -t container -o ns,cpuset nsproxy /containers

and we, as task 875, happen to be in the topmost container:

/containers/

Now we fork task 999 which does an unshare(CLONE_NEWNS), or we just
clone(CLONE_NEWNS).  This will create

/containers/node_999

and move task 999 into that container.  Except that when it tries
attach_task() it is refused by cpuset.  So the container_clone() fails,
and in turn the sys_unshare() or clone() fails.  A login making use
of the pam_namespace.so library would fail this way with the
ns and cpuset subsystems composed.

We could special case this by having
kernel/container.c:container_clone() check whether one of the subsystems
is cpusets and, if so, setting the defaults for mems and cpus, but
that is kind of ugly.  I suppose as a cleaner alternative we could 
add a container_subsys->inherit_defaults() handler, to be called at
container_clone(), and for cpusets this would set cpus and mems to
the parent values - sibling exclusive values.  If that comes to nothing,
then the attach_task() is still refused, and the unshare() or clone()
fails, but this time with good reason.

thanks,
-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc4-mm1

2007-06-06 Thread Robert Hancock

Andrew Morton wrote:

Yeah, this caused test.kernel.org to fail as well.

There are a couple of fixes in
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm1/hot-fixes/
which should get things going again.

Robert, I spent some time picking at
mmconfig-validate-against-acpi-motherboard-resources.patch then got bored
with fiddling with it and reverted it outright.


The minimal fix would be to put #ifdef CONFIG_PCI_MMCONFIG around the 
call to pci_mmcfg_late_init in drivers/acpi/bus.c.




Please, we need to get those prototypes of pci_mmcfg_early_init() and
pci_mmcfg_late_init() into some sane place which works on all
architectures, not duplicate one of them in a C file and even see if we can
avoid the #ifdef CONFIG_PCI_MMCONFIG in arch/i386/pci/init.c

This code area is really messy, due partly to the x86_64 and i386 sharing. 
Any changes in there need careful testing and checking.


I'm not sure there's a point in making the prototypes for those 
functions global to all architectures, since it's unlikely anything 
non-X86 could make use of them with any similar semantics. We could 
provide a no-op definition of those functions to avoid the need to ifdef 
the calls, though.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> I still believe that we should provide, in effect, vmlinux as a
> (compressed) ELF file rather than provide the intermediate stage.  It
> would reduce the complexity of testing (all information provided about a
> stage have to be both guaranteed to even make sense in the future as
> well as be tested to conform to such information

I'm not sure I follow you.  Sure, you're right that the Phdr info
contained within the bzImage needs to be tested for correctness.  This
wouldn't normally happen when booting native, but when booting under the
most constrained environment - Xen - it will be tested (and I intend
making the Xen loader as strict as possible).  Of course, it won't help
if the Phdrs are overmap too much, but I don't think that matters too
much, so long as the mappings are not excessively large.

I'm not sure what you mean about "make sense in the future".  If you're
booting the kernel in a new paravirtualized environment, you've
presumably modified the kernel to understand that environment, and
perhaps had to update the boot image format a bit to deal with its
requirements.  I agree that updating the bzImage format may require
retesting in all the other environments, but I think that's probably
true for your scheme as well.

After all, you're assuming that the vmlinux itself provides all
necessary information to be loaded in any environment, which is not
necessarily true (it may need extra ELF notes, for example).  But if
there are any major structural changes needed in the vmlinux, then that
will be equally problematic for both directly using vmlinux and using
ELF-in-bzImage.  So I don't think your argument convincingly sways in
any particular direction.

> ) as well as cover a
> larger number of environments -- any environment where injecting data
> into memory is cheaper than execution is quite unhappy about the current
> system.  Such environments include heterogeneous embedded systems (think
> a slow CPU on a PCI card where the host CPU has direct access to the
> memory on the card) as well as simulators/emulators.
>   

Well, nothing in this scheme precludes the ELF file from being a plain
uncompressed kernel image.  If that's what these environments want, its
easy to provide with a small update to the Makefiles.

> For environments where so is appropriate it would even be possible to
> run the setup, invoke the code32_setup hook to do the decompression (and
> relocation, if appropriate) in host space.
>   

Well, that's what we currently have, and we can't break backwards
compatibility.

> This makes vmlinux (normally stripped) recoverable from the bzImage file
> and so anything that is currently booting vmlinux would be serviced by
> this scheme.
>   

I'm not sure I fully understand the mechanism you're proposing.  You
have the 16-bit setup code, the 32-bit decompressor, and an ELF.gz. Once
the decompressor has extracted the actual ELF file, are you proposing
that it properly parse the ELF file and follow its instuctions to put
the segments in the appropriate places, or are you assuming that the
decompressor can just skip that part and plonk the ELF file where it wants?

In other words, do you see the Phdrs as being descriptive or prescriptive?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Another missing RAM on x86_64

2007-06-06 Thread Robert Hancock

Reinaldo de Carvalho wrote:

This laptop have a nVidia 10de:0244 with 256Mb of RAM. No shared memory.


Reinaldo de Carvalho



00:05.0 VGA compatible controller: nVidia Corporation C51 PCI Express 
Bridge (rev a2) (prog-if 00 [VGA])

Subsystem: Hewlett-Packard Company Unknown device 30b5
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
Latency: 0
Interrupt: pin A routed to IRQ 21
Region 0: Memory at c200 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at d000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at c100 (64-bit, non-prefetchable) [size=16M]
[virtual] Expansion ROM at 5000 [disabled] [size=128K]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)

Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/0 Enable-

Address:   Data: 


It's a bit mysterious, then, where the extra 40-some MB of RAM has gone. 
However, there's not much the kernel can do about it, as the BIOS is not 
telling us about that memory..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread H. Peter Anvin
Jeremy Fitzhardinge wrote:
> This patch makes the payload of the bzImage file an ELF file.  In
> other words, the bzImage is structured as follows:
>  - boot sector
>  - 16bit setup code
>  - ELF header
>   - decompressor
>   - compressed kernel
> 
> A bootloader may find the start of the ELF file by looking at the
> setup_size entry in the boot params, and using that to find the offset
> of the ELF header.  The ELF Phdrs contain all the mapped memory
> required to decompress and start booting the kernel.
> 
> One slightly complex part of this is that the bzImage boot_params need
> to know about the internal structure of the ELF file, at least to the
> extent of being able to point the core32_start entry at the ELF file's
> entrypoint, so that loaders which use this field will still work.
> 
> Similarly, the ELF header needs to know how big the kernel vmlinux's
> bss segment is, in order to make sure is is mapped properly.
> 
> To handle these two cases, we generate abstracted versions of the
> object files which only contain the symbols we care about (generated
> with objcopy --strip-all --keep-symbol=X), and then include those
> symbol tables with ld -R.

I still believe that we should provide, in effect, vmlinux as a
(compressed) ELF file rather than provide the intermediate stage.  It
would reduce the complexity of testing (all information provided about a
stage have to be both guaranteed to even make sense in the future as
well as be tested to conform to such information) as well as cover a
larger number of environments -- any environment where injecting data
into memory is cheaper than execution is quite unhappy about the current
system.  Such environments include heterogeneous embedded systems (think
a slow CPU on a PCI card where the host CPU has direct access to the
memory on the card) as well as simulators/emulators.

For environments where so is appropriate it would even be possible to
run the setup, invoke the code32_setup hook to do the decompression (and
relocation, if appropriate) in host space.

This makes vmlinux (normally stripped) recoverable from the bzImage file
and so anything that is currently booting vmlinux would be serviced by
this scheme.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc4-mm1

2007-06-06 Thread Andrew Morton
On Thu, 7 Jun 2007 07:28:31 +1000 Herbert Xu <[EMAIL PROTECTED]> wrote:

> On Wed, Jun 06, 2007 at 01:24:39PM -0700, Andrew Morton wrote:
> >
> > > And for some reason the whole Cryptographic API is under the main level 
> > > of menu 
> > > (please find .config and menu.png attached).
> > 
> > err, yes.  git-cryptodev.patch did that.  Herbert is being immodest ;)
> 
> Is it this patch?

looks like it.

> If so then you sent to it me :)

You merged it ;)

> Should I drop it?

Sure, Jan will fix it up, I assume.  I might have broken it while repairing
the reject storm which occurred when that durned HAS_IOMEM thing went in
all over the tree.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 4/7] define ELF notes for adding to a boot image

2007-06-06 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: Vivek Goyal <[EMAIL PROTECTED]>

---
 include/linux/elf_boot.h |   15 +++
 1 file changed, 15 insertions(+)

===
--- /dev/null
+++ b/include/linux/elf_boot.h
@@ -0,0 +1,15 @@
+#ifndef ELF_BOOT_H
+#define ELF_BOOT_H
+
+/* Elf notes to help bootloaders identify what program they are booting.
+ */
+
+/* Standardized Elf image notes for booting... The name for all of these is 
ELFBoot */
+#define ELF_NOTE_BOOT  ELFBoot
+
+#define EIN_PROGRAM_NAME   1 /* The program in this ELF file */
+#define EIN_PROGRAM_VERSION2 /* The version of the program in this ELF 
file */
+#define EIN_PROGRAM_CHECKSUM   3 /* ip style checksum of the memory image. */
+#define EIN_ARGUMENT_STYLE 4 /* String identifying argument passing style 
*/
+
+#endif /* ELF_BOOT_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 3/7] allow linux/elf.h to be included in assembler

2007-06-06 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 include/linux/elf.h |   24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

===
--- a/include/linux/elf.h
+++ b/include/linux/elf.h
@@ -1,9 +1,10 @@
 #ifndef _LINUX_ELF_H
 #define _LINUX_ELF_H
 
+#include 
+#ifndef __ASSEMBLY__
 #include 
 #include 
-#include 
 #include 
 
 struct file;
@@ -31,6 +32,7 @@ typedef __u32 Elf64_Word;
 typedef __u32  Elf64_Word;
 typedef __u64  Elf64_Xword;
 typedef __s64  Elf64_Sxword;
+#endif /* __ASSEMBLY__ */
 
 /* These constants are for the segment types stored in the image headers */
 #define PT_NULL0
@@ -123,6 +125,7 @@ typedef __s64   Elf64_Sxword;
 #define ELF64_ST_BIND(x)   ELF_ST_BIND(x)
 #define ELF64_ST_TYPE(x)   ELF_ST_TYPE(x)
 
+#ifndef __ASSEMBLY__
 typedef struct dynamic{
   Elf32_Sword d_tag;
   union{
@@ -138,6 +141,7 @@ typedef struct {
 Elf64_Addr d_ptr;
   } d_un;
 } Elf64_Dyn;
+#endif /* __ASSEMBLY__ */
 
 /* The following are used with relocations */
 #define ELF32_R_SYM(x) ((x) >> 8)
@@ -146,6 +150,7 @@ typedef struct {
 #define ELF64_R_SYM(i) ((i) >> 32)
 #define ELF64_R_TYPE(i)((i) & 0x)
 
+#ifndef __ASSEMBLY__
 typedef struct elf32_rel {
   Elf32_Addr   r_offset;
   Elf32_Word   r_info;
@@ -185,11 +190,12 @@ typedef struct elf64_sym {
   Elf64_Addr st_value; /* Value of the symbol */
   Elf64_Xword st_size; /* Associated symbol size */
 } Elf64_Sym;
-
+#endif /* __ASSEMBLY__ */
 
 #define EI_NIDENT  16
 
-typedef struct elf32_hdr{
+#ifndef __ASSEMBLY__
+typedef struct elf32_hdr {
   unsigned chare_ident[EI_NIDENT];
   Elf32_Half   e_type;
   Elf32_Half   e_machine;
@@ -222,6 +228,7 @@ typedef struct elf64_hdr {
   Elf64_Half e_shnum;
   Elf64_Half e_shstrndx;
 } Elf64_Ehdr;
+#endif /* __ASSEMBLY__ */
 
 /* These constants define the permissions on sections in the program
header, p_flags. */
@@ -229,7 +236,8 @@ typedef struct elf64_hdr {
 #define PF_W   0x2
 #define PF_X   0x1
 
-typedef struct elf32_phdr{
+#ifndef __ASSEMBLY__
+typedef struct elf32_phdr {
   Elf32_Word   p_type;
   Elf32_Offp_offset;
   Elf32_Addr   p_vaddr;
@@ -250,6 +258,7 @@ typedef struct elf64_phdr {
   Elf64_Xword p_memsz; /* Segment size in memory */
   Elf64_Xword p_align; /* Segment alignment, file & memory */
 } Elf64_Phdr;
+#endif /* __ASSEMBLY__ */
 
 /* sh_type */
 #define SHT_NULL   0
@@ -284,7 +293,8 @@ typedef struct elf64_phdr {
 #define SHN_ABS0xfff1
 #define SHN_COMMON 0xfff2
 #define SHN_HIRESERVE  0x
- 
+
+#ifndef __ASSEMBLY__
 typedef struct {
   Elf32_Word   sh_name;
   Elf32_Word   sh_type;
@@ -310,6 +320,7 @@ typedef struct elf64_shdr {
   Elf64_Xword sh_addralign;/* Section alignment */
   Elf64_Xword sh_entsize;  /* Entry size if section holds table */
 } Elf64_Shdr;
+#endif /* __ASSEMBLY__ */
 
 #defineEI_MAG0 0   /* e_ident[] indexes */
 #defineEI_MAG1 1
@@ -343,6 +354,7 @@ typedef struct elf64_shdr {
 
 #define ELFOSABI_NONE  0
 #define ELFOSABI_LINUX 3
+#define ELFOSABI_STANDALONE255
 
 #ifndef ELF_OSABI
 #define ELF_OSABI ELFOSABI_NONE
@@ -357,6 +369,7 @@ typedef struct elf64_shdr {
 #define NT_PRXFPREG 0x46e62b7f  /* copied from 
gdb5.1/include/elf/common.h */
 
 
+#ifndef __ASSEMBLY__
 /* Note header in a PT_NOTE section */
 typedef struct elf32_note {
   Elf32_Word   n_namesz;   /* Name size */
@@ -396,5 +409,6 @@ static inline void arch_write_notes(stru
 #define ELF_CORE_EXTRA_NOTES_SIZE arch_notes_size()
 #define ELF_CORE_WRITE_EXTRA_NOTES arch_write_notes(file)
 #endif /* ARCH_HAVE_EXTRA_ELF_NOTES */
+#endif /* __ASSEMBLY__ */
 
 #endif /* _LINUX_ELF_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 7/7] i386: paravirt boot sequence

2007-06-06 Thread Jeremy Fitzhardinge
This patch uses the updated boot protocol to do paravirtualized boot.
If the boot version is >= 2.07, then it will do two things:

 1. Check the bootparams loadflags to see if we should reload the
segment registers and clear interrupts.  This is appropriate
for normal native boot and some paravirtualized environments, but
inappropraite for others.

 2. Check the hardware architecture, and dispatch to the appropriate
kernel entrypoint.  If the bootloader doesn't set this, then we
simply do the normal boot sequence.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: H. Peter Anvin <[EMAIL PROTECTED]>
Cc: Vivek Goyal <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

---
 arch/i386/boot/header.S |9 -
 arch/i386/kernel/head.S |   47 +++
 2 files changed, 51 insertions(+), 5 deletions(-)

===
--- a/arch/i386/boot/header.S
+++ b/arch/i386/boot/header.S
@@ -119,7 +119,7 @@ 1:
# Part 2 of the header, from the old setup.S
 
.ascii  "HdrS"  # header signature
-   .word   0x0206  # header version number (>= 0x0105)
+   .word   0x0207  # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail)
.globl realmode_swtch
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
@@ -209,6 +209,13 @@ cmdline_size:   .long   COMMAND_LINE_SIZ
 #added with boot protocol
 #version 2.06
 
+hardware_subarch:  .long 0 # subarchitecture, added with 
2.07
+   # default to 0 for normal x86 PC
+
+hardware_subarch_data: .quad 0
+
+kernel_payload:.long blob_payload  # raw kernel data
+
 # End of setup header #
 
.section ".inittext", "ax"
===
--- a/arch/i386/kernel/head.S
+++ b/arch/i386/kernel/head.S
@@ -71,28 +71,37 @@ INIT_MAP_BEYOND_END = BOOTBITMAP_SIZE + 
  */
 .section .text.head,"ax",@progbits
 ENTRY(startup_32)
+   /* check to see if KEEP_SEGMENTS flag is meaningful */
+   cmpw $0x207, BP_version(%esi)
+   jb 1f
+
+   /* test KEEP_SEGMENTS flag to see if the bootloader is asking
+   us to not reload segments */
+   testb $(1<<6), BP_loadflags(%esi)
+   jnz 2f
 
 /*
  * Set segments to known values.
  */
-   cld
-   lgdt boot_gdt_descr - __PAGE_OFFSET
+1: lgdt boot_gdt_descr - __PAGE_OFFSET
movl $(__BOOT_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
+2:
 
 /*
  * Clear BSS first so that there are no surprises...
- * No need to cld as DF is already clear from cld above...
- */
+ */
+   cld
xorl %eax,%eax
movl $__bss_start - __PAGE_OFFSET,%edi
movl $__bss_stop - __PAGE_OFFSET,%ecx
subl %edi,%ecx
shrl $2,%ecx
rep ; stosl
+
 /*
  * Copy bootup parameters out of the way.
  * Note: %esi still has the pointer to the real-mode data.
@@ -120,6 +129,35 @@ 2:
movsl
 1:
 
+#ifdef CONFIG_PARAVIRT
+   cmpw $0x207, (boot_params + BP_version - __PAGE_OFFSET)
+   jb default_entry
+
+   /* Paravirt-compatible boot parameters.  Look to see what architecture
+   we're booting under. */
+   movl (boot_params + BP_hardware_subarch - __PAGE_OFFSET), %eax
+   cmpl $num_subarch_entries, %eax
+   jae bad_subarch
+
+   movl subarch_entries - __PAGE_OFFSET(,%eax,4), %eax
+   subl $__PAGE_OFFSET, %eax
+   jmp *%eax
+
+bad_subarch:
+WEAK(lguest_entry)
+WEAK(xen_entry)
+   /* Unknown implementation; there's really
+  nothing we can do at this point. */
+   ud2a
+.data
+subarch_entries:
+   .long default_entry /* normal x86/PC */
+   .long lguest_entry  /* lguest hypervisor */
+   .long xen_entry /* Xen hypervisor */
+num_subarch_entries = (. - subarch_entries) / 4
+.previous
+#endif /* CONFIG_PARAVIRT */
+
 /*
  * Initialize page tables.  This creates a PDE and a set of page
  * tables, which are located immediately beyond _end.  The variable
@@ -132,6 +170,7 @@ 1:
  */
 page_pde_offset = (__PAGE_OFFSET >> 20);
 
+default_entry:
movl $(pg0 - __PAGE_OFFSET), %edi
movl $(swapper_pg_dir - __PAGE_OFFSET), %edx
movl $0x007, %eax   /* 0x007 = PRESENT+RW+USER */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[PATCH RFC 1/7] update boot spec to 2.07

2007-06-06 Thread Jeremy Fitzhardinge
Proposed updates for version 2.07 of the boot protocol.  This includes:

load_flags.KEEP_SEGMENTS- flag to request/inhibit segment reloads
hardware_subarch- what subarchitecture we're booting under
hardware_subarch_data   - per-architecture data
kernel_payload  - address of the raw kernel blob

The intention of these changes is to make booting a paravirtualized
kernel work via the normal Linux boot protocol.  The intention is that
the bzImage payload can be a properly formed ELF file, so that the
bootloader can use its ELF notes and Phdrs to get more metadata about
the kernel and its requirements.

The ELF file could be the uncompressed kernel vmlinux itself; it would
only take small buildsystem changes to implement this.

kernel_payload was added so that a bootloader can just get to the raw
bits of the kernel, so that it can do its own decompression/relocation
if it wishes.  This is not particularly well-defined yet; I just added
it with the hope that it keeps HPA happy.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: H. Peter Anvin <[EMAIL PROTECTED]>
Cc: Vivek Goyal <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

---
 Documentation/i386/boot.txt|   43 +++-
 arch/i386/kernel/asm-offsets.c |7 ++
 include/asm-i386/bootparam.h   |   10 +++--
 3 files changed, 57 insertions(+), 3 deletions(-)

===
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -168,6 +168,9 @@ 0234/1  2.05+   relocatable_kernel Whether 
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/3 N/A pad2Unused
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
+023C/4 2.07+   hardware_subarch Hardware subarchitecture
+0240/8 2.07+   hardware_subarch_data Subarchitecture-specific data
+0248/4 2.07+   kernel_payload  Pointer to raw kernel data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -204,7 +207,7 @@ boot loaders can ignore those fields.
 
 The byte order of all fields is littleendian (this is x86, after all.)
 
-Field name:setup_secs
+Field name:setup_sects
 Type:  read
 Offset/size:   0x1f1/1
 Protocol:  ALL
@@ -356,6 +359,13 @@ Protocol:  2.00+
- If 0, the protected-mode code is loaded at 0x1.
- If 1, the protected-mode code is loaded at 0x10.
 
+  Bit 6 (write): KEEP_SEGMENTS
+   Protocol: 2.07+
+   - if 0, reload the segment registers in the 32bit entry point.
+   - if 1, do not reload the segment registers in the 32bit entry point.
+   Assume that %cs %ds %ss %es are all set to flat segments with
+   a base of 0 (or the equivalent for their environment).
+
   Bit 7 (write): CAN_USE_HEAP
Set this bit to 1 to indicate that the value entered in the
heap_end_ptr is valid.  If this field is clear, some setup code
@@ -479,6 +489,37 @@ Protocol:  2.06+
   zero. This means that the command line can contain at most
   cmdline_size characters. With protocol version 2.05 and earlier, the
   maximum size was 255.
+
+Field name:hardware_subarch
+Type:  write
+Offset/size:   0x23c/4
+Protocol:  2.07+
+
+  In a paravirtualized environment the hardware low level architectural
+  pieces such as interrupt handling, page table handling, and
+  accessing process control registers needs to be done differently.
+
+  This field allows the bootloader to inform the kernel we are in one
+  one of those environments.
+
+  0x   The default x86/PC environment
+  0x0001   lguest
+  0x0002   Xen
+
+Field name:hardware_subarch_data
+Type:  write
+Offset/size:   0x240/8
+Protocol:  2.07+
+
+  A pointer to data that is specific to hardware subarch
+
+Field name:kernel_payload
+Type:  read
+Offset/size:   0x248/4
+Protocol:  2.07+
+
+  The relocated pointer to the actual kernel payload, in whatever form
+  it exists in (gzip image, normally).
 
 
  THE KERNEL COMMAND LINE
===
--- a/arch/i386/kernel/asm-offsets.c
+++ b/arch/i386/kernel/asm-offsets.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -143,4 +144,10 @@ void foo(void)
OFFSET(LGUEST_PAGES_regs_errcode, lguest_pages, regs.errcode);
OFFSET(LGUEST_PAGES_regs, lguest_pages, regs);
 #endif
+
+   BLANK();
+   OFFSET(BP_scratch, boot_params, scratch);
+   OFFSET(BP_loadflags, boot_params, hdr.loadflags);
+   OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
+   OFFSET(BP_version, boot_params, hdr.version);
 }
===
--- a/include/asm-i386/bootparam.h
+++ b/include/asm-i386/bootparam.h
@@ -24,8 +24,9 @@ 

[PATCH RFC 2/7] add WEAK() for creating weak asm labels

2007-06-06 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 include/linux/linkage.h |6 ++
 1 file changed, 6 insertions(+)

===
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -34,6 +34,12 @@
   name:
 #endif
 
+#ifndef WEAK
+#define WEAK(name)\
+   .weak name;\
+   name:
+#endif
+
 #define KPROBE_ENTRY(name) \
   .pushsection .kprobes.text, "ax"; \
   ENTRY(name)

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 5/7] i386: clean up bzImage generation

2007-06-06 Thread Jeremy Fitzhardinge
This patch cleans up image generation in several ways:
 - Firstly, it removes tools/build, and uses binutils to do all the
   final construction of the bzImage.  This removes a chunk of code
   and makes the image generation more flexible, since we can compute
   various numbers rather than be forced to use fixed constants.

 - Rename compressed/vmlinux to compressed/blob, to make it a
   bit clearer that it's the compressed kernel image + decompressor
   (now all the files named "vmlinux*" are directly derived from
   the kernel vmlinux).

 - Rather than using objcopy to wrap the compressed kernel into an
   object file, simply use the assembler: payload.S does a .incbin
   of the blob.bin file, which allows us to easily place
   it into a section, and it makes the Makefile dependency a little
   clearer.

 - Similarly, use the same technique to create compressed/piggy.o,
   which cleans things up even more, since the .S file can also
   set the input and output_size symbols without further linker
   script hackery; it also removes a complete linker script.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: H. Peter Anvin <[EMAIL PROTECTED]>
Cc: Vivek Goyal <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

---
 arch/i386/boot/Makefile   |   31 +-
 arch/i386/boot/compressed/Makefile|   13 --
 arch/i386/boot/compressed/piggy.S |   10 +
 arch/i386/boot/compressed/vmlinux.scr |   10 -
 arch/i386/boot/header.S   |6 -
 arch/i386/boot/payload.S  |3 
 arch/i386/boot/setup.ld   |   39 ---
 arch/i386/boot/tools/.gitignore   |1 
 arch/i386/boot/tools/build.c  |  168 -
 9 files changed, 56 insertions(+), 225 deletions(-)

===
--- a/arch/i386/boot/Makefile
+++ b/arch/i386/boot/Makefile
@@ -25,12 +25,13 @@ SVGA_MODE := -DSVGA_MODE=NORMAL_VGA
 
 #RAMDISK := -DRAMDISK=512
 
-targets:= vmlinux.bin setup.bin setup.elf zImage bzImage
+targets:= blob.bin setup.elf zImage bzImage
 subdir-:= compressed
 
 setup-y+= a20.o apm.o cmdline.o copy.o cpu.o cpucheck.o edd.o
-setup-y+= header.o main.o mca.o memory.o pm.o pmjump.o
-setup-y+= printf.o string.o tty.o video.o version.o voyager.o
+setup-y+= header.o main.o mca.o memory.o payload.o pm.o
+setup-y+= pmjump.o printf.o string.o tty.o video.o version.o
+setup-y+= voyager.o
 
 # The link order of the video-*.o modules can matter.  In particular,
 # video-vga.o *must* be listed first, followed by video-vesa.o.
@@ -39,10 +40,6 @@ setup-y  += video-vga.o
 setup-y+= video-vga.o
 setup-y+= video-vesa.o
 setup-y+= video-bios.o
-
-hostprogs-y:= tools/build
-
-HOSTCFLAGS_build.o := $(LINUXINCLUDE)
 
 # ---
 
@@ -65,18 +62,12 @@ AFLAGS  := $(CFLAGS) -D__ASSEMBLY__
 $(obj)/bzImage: IMAGE_OFFSET := 0x10
 $(obj)/bzImage: EXTRA_CFLAGS := -D__BIG_KERNEL__
 $(obj)/bzImage: EXTRA_AFLAGS := $(SVGA_MODE) $(RAMDISK) -D__BIG_KERNEL__
-$(obj)/bzImage: BUILDFLAGS   := -b
 
-quiet_cmd_image = BUILD   $@
-cmd_image = $(obj)/tools/build $(BUILDFLAGS) $(obj)/setup.bin \
-   $(obj)/vmlinux.bin $(ROOT_DEV) > $@
-
-$(obj)/zImage $(obj)/bzImage: $(obj)/setup.bin \
- $(obj)/vmlinux.bin $(obj)/tools/build FORCE
-   $(call if_changed,image)
+$(obj)/zImage $(obj)/bzImage: $(obj)/setup.elf FORCE
+   $(call if_changed,objcopy)
@echo 'Kernel: $@ is ready' ' (#'`cat .version`')'
 
-$(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE
+$(obj)/blob.bin: $(obj)/compressed/blob FORCE
$(call if_changed,objcopy)
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
@@ -85,12 +76,10 @@ LDFLAGS_setup.elf   := -T
 $(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
$(call if_changed,ld)
 
-OBJCOPYFLAGS_setup.bin := -O binary
+$(obj)/payload.o:  EXTRA_AFLAGS := -Wa,-I$(obj)
+$(obj)/payload.o: $(src)/payload.S $(obj)/blob.bin
 
-$(obj)/setup.bin: $(obj)/setup.elf FORCE
-   $(call if_changed,objcopy)
-
-$(obj)/compressed/vmlinux: FORCE
+$(obj)/compressed/blob: FORCE
$(Q)$(MAKE) $(build)=$(obj)/compressed IMAGE_OFFSET=$(IMAGE_OFFSET) $@
 
 # Set this if you want to pass append arguments to the zdisk/fdimage/isoimage 
kernel
===
--- a/arch/i386/boot/compressed/Makefile
+++ b/arch/i386/boot/compressed/Makefile
@@ -4,11 +4,10 @@
 # create a compressed vmlinux image from the original vmlinux
 #
 
-targets:= vmlinux vmlinux.bin vmlinux.bin.gz head.o misc.o 
piggy.o \
+targets:= blob vmlinux.bin vmlinux.bin.gz head.o misc.o 
piggy.o \
  

[PATCH RFC 6/7] i386: make the bzImage payload an ELF file

2007-06-06 Thread Jeremy Fitzhardinge
This patch makes the payload of the bzImage file an ELF file.  In
other words, the bzImage is structured as follows:
 - boot sector
 - 16bit setup code
 - ELF header
  - decompressor
  - compressed kernel

A bootloader may find the start of the ELF file by looking at the
setup_size entry in the boot params, and using that to find the offset
of the ELF header.  The ELF Phdrs contain all the mapped memory
required to decompress and start booting the kernel.

One slightly complex part of this is that the bzImage boot_params need
to know about the internal structure of the ELF file, at least to the
extent of being able to point the core32_start entry at the ELF file's
entrypoint, so that loaders which use this field will still work.

Similarly, the ELF header needs to know how big the kernel vmlinux's
bss segment is, in order to make sure is is mapped properly.

To handle these two cases, we generate abstracted versions of the
object files which only contain the symbols we care about (generated
with objcopy --strip-all --keep-symbol=X), and then include those
symbol tables with ld -R.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: H. Peter Anvin <[EMAIL PROTECTED]>
Cc: Vivek Goyal <[EMAIL PROTECTED]>
Cc: Rusty Russell <[EMAIL PROTECTED]>

---
 arch/i386/boot/Makefile   |   11 --
 arch/i386/boot/compressed/Makefile|   29 +--
 arch/i386/boot/compressed/elfhdr.S|   60 +
 arch/i386/boot/compressed/head.S  |9 ++--
 arch/i386/boot/compressed/notes.S |7 +++
 arch/i386/boot/compressed/vmlinux.lds |   24 ++---
 arch/i386/boot/header.S   |7 ---
 arch/i386/boot/setup.ld   |5 ++
 arch/i386/kernel/head.S   |1 
 arch/i386/kernel/vmlinux.lds.S|1 
 10 files changed, 131 insertions(+), 23 deletions(-)

===
--- a/arch/i386/boot/Makefile
+++ b/arch/i386/boot/Makefile
@@ -72,14 +72,19 @@ AFLAGS  := $(CFLAGS) -D__ASSEMBLY__
 
 SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))
 
-LDFLAGS_setup.elf  := -T
-$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS) FORCE
+$(obj)/zImage $(obj)/bzImage:  \
+   LDFLAGS :=  \
+   -R $(obj)/compressed/blob-syms  \
+   --defsym IMAGE_OFFSET=$(IMAGE_OFFSET) -T
+
+$(obj)/setup.elf: $(src)/setup.ld $(SETUP_OBJS)\
+   $(obj)/compressed/blob-syms FORCE
$(call if_changed,ld)
 
 $(obj)/payload.o:  EXTRA_AFLAGS := -Wa,-I$(obj)
 $(obj)/payload.o: $(src)/payload.S $(obj)/blob.bin
 
-$(obj)/compressed/blob: FORCE
+$(obj)/compressed/blob $(obj)/compressed/blob-syms: FORCE
$(Q)$(MAKE) $(build)=$(obj)/compressed IMAGE_OFFSET=$(IMAGE_OFFSET) $@
 
 # Set this if you want to pass append arguments to the zdisk/fdimage/isoimage 
kernel
===
--- a/arch/i386/boot/compressed/Makefile
+++ b/arch/i386/boot/compressed/Makefile
@@ -4,21 +4,42 @@
 # create a compressed vmlinux image from the original vmlinux
 #
 
-targets:= blob vmlinux.bin vmlinux.bin.gz head.o misc.o 
piggy.o \
+targets:= blob vmlinux.bin vmlinux.bin.gz \
+   elfhdr.o head.o misc.o notes.o piggy.o \
vmlinux.bin.all vmlinux.relocs
 
-LDFLAGS_blob   := -T
 hostprogs-y:= relocs
 
 CFLAGS  := -m32 -D__KERNEL__ $(LINUX_INCLUDE) -O2 \
   -fno-strict-aliasing -fPIC \
   $(call cc-option,-ffreestanding) \
   $(call cc-option,-fno-stack-protector)
-LDFLAGS := -m elf_i386
+LDFLAGS := -R $(obj)/vmlinux-syms --defsym IMAGE_OFFSET=$(IMAGE_OFFSET) -T
 
-$(obj)/blob: $(src)/vmlinux.lds $(obj)/head.o $(obj)/misc.o $(obj)/piggy.o 
FORCE
+OBJS=$(addprefix $(obj)/,elfhdr.o head.o misc.o notes.o piggy.o)
+
+$(obj)/blob: $(src)/vmlinux.lds $(obj)/vmlinux-syms $(OBJS) FORCE
$(call if_changed,ld)
@:
+
+# Generate a stripped-down object including only the symbols needed
+# so that we can get them with ld -R. Direct stderr to /dev/null to
+# shut useless warning up.
+quiet_cmd_symextract = SYMEXT $@
+  cmd_symextract = objcopy -S \
+   $(addprefix -j,$(EXTRACTSECTS)) \
+   $(addprefix -K,$(EXTRACTSYMS)) \
+   $< $@ 2>/dev/null
+
+$(obj)/blob-syms: EXTRACTSYMS := blob_entry blob_payload
+$(obj)/blob-syms: EXTRACTSECTS := .text.head .data.compressed
+$(obj)/blob-syms: $(obj)/blob FORCE
+   $(call if_changed,symextract)
+
+$(obj)/vmlinux-syms: EXTRACTSYMS := __reserved_end
+$(obj)/vmlinux-syms: EXTRACTSECTS := .bss
+$(obj)/vmlinux-syms: vmlinux FORCE
+   $(call if_changed,symextract)
 
 $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
===

[PATCH RFC 0/7] proposed updates to boot protocol and paravirt booting

2007-06-06 Thread Jeremy Fitzhardinge
This series:
 1. Updates the boot protocol to version 2.07
 2. Clean up the existing build process, to get rid of tools/build and
make the linker do more heavy lifting
 3. Make the bzImage payload an ELF file.  The bootloader can extract
this as a naked ELF file by skipping over boot_params.setup_sects worth
of 16-bit setup code.
 4. Update the boot_params to 2.07, and update the kernel's head.S to
jump to the appropriate subarch-specific kernel entrypoint.  The
very earliest code is common (copy boot_params, clear bss); the
split happens just before the initial pagetable setup.
+ random little changes to make it all hang together

This boots native for me, so everything basically works.  But I haven't
tested it end-to-end yet, because I haven't done the Xen bits yet.
Perhaps Rusty can do the lguest version to verify that its all sound in
principle (hint hint ;).

So, how does it look?

J
-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.22-rc4-mm1

2007-06-06 Thread Andrew Morton
On Wed, 6 Jun 2007 21:58:38 +0100 Grant Wilson <[EMAIL PROTECTED]> wrote:

> On Wednesday 06 June 2007 10:07:37 Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc4/2.6.22-rc4-mm1/
> 
> Patch 'usb-try-to-debug-bug-8561' triggers when I plug in a usb flash drive:

Cool, thanks.

> [10998.881000] usb 1-10: new high speed USB device using ehci_hcd and address 
> 3
> [10999.001000] usb 1-10: new device found, idVendor=13fe, idProduct=1a00
> [10999.002000] usb 1-10: new device strings: Mfr=1, Product=2, SerialNumber=3
> [10999.016000] usb 1-10: Product: USB DISK 2.0
> [10999.025000] usb 1-10: Manufacturer:
> [10999.033000] usb 1-10: SerialNumber: 07720947018D
> [10999.034000] usb 1-10: configuration #1 chosen from 1 choice
> [10999.047000] scsi8 : SCSI emulation for USB Mass Storage devices
> [11004.055000] WARNING: at drivers/usb/core/urb.c:293 usb_submit_urb()
> [11004.055000]
> [11004.055000] Call Trace:
> [11004.055000]  [] dump_trace+0x43f/0x480
> [11004.055000]  [] show_trace+0x43/0x70
> [11004.055000]  [] dump_stack+0x15/0x20
> [11004.055000]  [] usb_submit_urb+0x224/0x240
> [11004.055000]  [] usb_sg_wait+0xd5/0x180
> [11004.055000]  [] usb_stor_bulk_transfer_sg+0xc4/0x120
> [11004.055000]  [] usb_stor_Bulk_transport+0x151/0x2e0
> [11004.055000]  [] usb_stor_invoke_transport+0x37/0x380
> [11004.055000]  [] 
> usb_stor_transparent_scsi_command+0x9/0x10
> [11004.055000]  [] usb_stor_control_thread+0x18a/0x230
> [11004.055000]  [] kthread+0x4d/0x80
> [11004.055000]  [] child_rip+0xa/0x12
> 

Alan, you got a bite - reel her in!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trim memory not covered by WB MTRRs

2007-06-06 Thread Jesse Barnes
On Wednesday, June 6, 2007 4:15 pm Justin Piszcz wrote:
> On Wed, 6 Jun 2007, Randy Dunlap wrote:
> > On Wed, 6 Jun 2007 18:54:37 -0400 (EDT) Justin Piszcz wrote:
> >> Hm, not sure if it was from the patch or what but I ran this:
> >>
> >> 1. swapoff -a
> >> 2. ./eatmem
> >
> > You usually have to access the allocated memory, like:
> >
> > *d = 1.0;
> >
> > for it to actually be allocated (AFAIK).
> >
> >>}
> >>
> >>return 0;
> >> }
> >>
> >> Any idea why the OOM killer can or does not kill it?
> >
> > What are the values of /proc/sys/vm/overcommit* ?
> >
> > See Documentation/vm/overcommit-accounting .
>
> They should be the defaults as I do not change them:
>
> p34:~# find /proc/|grep -i overcommit
> /proc/sys/vm/overcommit_memory
> /proc/sys/vm/overcommit_ratio
> find: /proc/5128: No such file or directory
> p34:~# cat /proc/sys/vm/overcommit_memory
> 0
> p34:~# cat /proc/sys/vm/overcommit_ratio
> 50
> p34:~#
>
>
> Comments?

You can be sure your memory is available if reported in /proc/meminfo or 
at boot, since those represent the actual kernel data structures used 
for memory allocation:

[0.00] On node 0 totalpages: 2061783

That corresponds to 2061783*4k = 8445063168 bytes or ~8053M.  Is that 
fairly close to what's actually installed in the machine?

Note that your boot also mentions this:

[  106.449661] mtrr: no more MTRRs available

which indicates that things like X may not be able to map the 
framebuffer with the 'write-combine' attribute, which will hurt 
performance.  I've heard reports that turning of 'Intel QST fan 
control' in your BIOS settings will prevent all your MTRRs from being 
used (improperly, probably another BIOS bug) so that X will perform 
well.  But if you don't use X on this machine, you don't have to worry 
about it.  The other option would be to remap your MTRRs by hand to 
free one up for X, you can do that by combining the last one or two 
entries into a single MTRR using the API described in 
Documentation/mtrr.txt before you start X.

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread Davide Libenzi
On Thu, 7 Jun 2007, Alan Cox wrote:

> > >   prctl(PR_SPARSEFD, 1);
> > > 
> > > to turn on sparse fd allocation for a process ?
> > 
> > There was a little discussion where I tried to whisper something similar, 
> > but Linus and Uli shot me :) - with good reasons IMO.
> > You may link to runtimes that are not non-sequentialfd aware, and will 
> > break them.
> 
> Linking to the correct version of a libary and getting the library
> versioning right is not rocket science and isn't a sane excuse. Its no
> different to the stdio to large fd migration issues with many Unixen and
> they all coped just fine.

I don't think it's a matter of versioning. Many userspace libraries 
expects their fds to be compact (for many reasons - they use select, they 
use them to index 0-based arrays, etc...), and if the kernel suddendly 
starts returning values in the 1<<28 up arena, they sure won't be happy.
So I believe that the correct way is that the caller specifically selects 
the feature, leaving the legacy fd allocation as default.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trim memory not covered by WB MTRRs

2007-06-06 Thread Jesse Barnes
On Wednesday, June 6, 2007 4:24 pm Justin Piszcz wrote:
> > The mem= approach though looks slightly off, but I haven't looked
> > at x86_64's mem= handling to see why.  From a high level though,
> > adjusting end_pfn is the right thing to do, since theoretically
> > mem= could choose to make holes in your low memory and keep your
> > high memory in the allocation pools (though it's not generally
> > implemented this way).
> >
> > Jesse
>
> Ahh, ok!  Sounds great, I will keep running the kernel with your
> patch without mem= and let you know if I see any issues.
>
> Chances of getting this into 2.6.22-rc5?

I'm not sure it's appropriate for -rc5 since it mucks around with some 
early boot ordering, but I'll leave that to Andi, since it does address 
some real bugs people have been seeing.

Can we add your "Tested-by:  Justin Piszcz <[EMAIL PROTECTED]>" to 
the patch? :)

Thanks,
Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trim memory not covered by WB MTRRs

2007-06-06 Thread Justin Piszcz



On Wed, 6 Jun 2007, Jesse Barnes wrote:


On Wednesday, June 6, 2007 3:57 pm Justin Piszcz wrote:

On Wed, 6 Jun 2007, Jesse Barnes wrote:

On Wednesday, June 6, 2007 3:26 pm Justin Piszcz wrote:

Nope, I booted with only netconsole= options.  I have a lot of HW
in the box and I guess the buffer is too small.  Not sure where to
change it in the kernel.  Looking..


It's called "kernel log buffer size" and it's in "General setup".

Jesse


Did the dmesg output get you what you needed?  Why the few KB
difference?

:)


Yeah, looked at your e820 and your MTRR settings and I think my patch is
doing the right thing (i.e. trimming just the right amount of memory,
leaving you with as much as possible).

The mem= approach though looks slightly off, but I haven't looked at
x86_64's mem= handling to see why.  From a high level though, adjusting
end_pfn is the right thing to do, since theoretically mem= could choose
to make holes in your low memory and keep your high memory in the
allocation pools (though it's not generally implemented this way).

Jesse



Ahh, ok!  Sounds great, I will keep running the kernel with your patch 
without mem= and let you know if I see any issues.


Chances of getting this into 2.6.22-rc5?

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Alan Cox wrote:
> Linking to the correct version of a libary and getting the library
> versioning right is not rocket science and isn't a sane excuse. Its no
> different to the stdio to large fd migration issues with many Unixen and
> they all coped just fine.

This has nothing to do with linking and ABI.  The assumptions about
continuous allocation are part of the API.  It's required by POSIX and
provided by Unix since the early days.  There are entire code bases out
there which depend on this assumption.  Linking with code like this,
before or after the new version controlled symbol is introduced, will
break it.  Policies or stateful behavior, however yo want to call it, is
just plain wrong for this (and most other things).

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFGZ0G92ijCOnn/RHQRAkB9AJ93ol7XV2GiCw+8wgbJ9uMBnHU6dQCgmmAp
9m+WEup3iPkEHH6HIHDa88I=
=Dhto
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG sparc64] 2.6.22-rc broke X on Ultra5

2007-06-06 Thread David Miller
From: David Miller <[EMAIL PROTECTED]>
Date: Wed, 30 May 2007 13:01:40 -0700 (PDT)

> From: Mikael Pettersson <[EMAIL PROTECTED]>
> Date: Wed, 30 May 2007 21:33:18 +0200 (MEST)
> 
> > You were spot on. 2.6.21 + patches up to but not including
> > the first one above works. Adding that one gave me a kernel
> > that wouldn't boot (hung after "uncompressing kernel").
> > Adding the second one above gave me a kernel that booted, but
> > where X failed to mmap() the frame buffer as I described.
> 
> Thanks for all of your testing.
> 
> I'll try to figure this out on one of my Ultra5's here.

So I did some more digging, got my ultra5 running and I can't
get it to work with both 2.6.20 and 2.6.22-rc4 :-)  This is
with xorg-7.2, and they both fail with:

xf86MapPciMem: Could not mmap ...

which I assume is the error you're seeing.

Investigation reveals that X.org is erroneously trying to
do PCI mmap ioctl()'s on /sys/bus/pci/devices/*/config
files which is very very wrong.  Again this happens for me
with both 2.6.20 and 2.6.22

Can you just quickly strace "Xorg" startup in the working
and non-working case?  Just a simple:

strace -o x.log Xorg

would for both cases be fine.

You can email it to me privately, and I'll post here my
analysis with the relevant portions quoted so we don't
flood the list with strace dumps :-)

Thanks a lot!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trim memory not covered by WB MTRRs

2007-06-06 Thread Jesse Barnes
On Wednesday, June 6, 2007 3:57 pm Justin Piszcz wrote:
> On Wed, 6 Jun 2007, Jesse Barnes wrote:
> > On Wednesday, June 6, 2007 3:26 pm Justin Piszcz wrote:
> >> Nope, I booted with only netconsole= options.  I have a lot of HW
> >> in the box and I guess the buffer is too small.  Not sure where to
> >> change it in the kernel.  Looking..
> >
> > It's called "kernel log buffer size" and it's in "General setup".
> >
> > Jesse
>
> Did the dmesg output get you what you needed?  Why the few KB
> difference?
>
> :)

Yeah, looked at your e820 and your MTRR settings and I think my patch is 
doing the right thing (i.e. trimming just the right amount of memory, 
leaving you with as much as possible).

The mem= approach though looks slightly off, but I haven't looked at 
x86_64's mem= handling to see why.  From a high level though, adjusting 
end_pfn is the right thing to do, since theoretically mem= could choose 
to make holes in your low memory and keep your high memory in the 
allocation pools (though it's not generally implemented this way).

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trim memory not covered by WB MTRRs

2007-06-06 Thread Justin Piszcz



On Wed, 6 Jun 2007, Randy Dunlap wrote:


On Wed, 6 Jun 2007 18:54:37 -0400 (EDT) Justin Piszcz wrote:


Hm, not sure if it was from the patch or what but I ran this:

1. swapoff -a
2. ./eatmem



You usually have to access the allocated memory, like:

*d = 1.0;

for it to actually be allocated (AFAIK).


   }

   return 0;
}

Any idea why the OOM killer can or does not kill it?


What are the values of /proc/sys/vm/overcommit* ?

See Documentation/vm/overcommit-accounting .


They should be the defaults as I do not change them:

p34:~# find /proc/|grep -i overcommit
/proc/sys/vm/overcommit_memory
/proc/sys/vm/overcommit_ratio
find: /proc/5128: No such file or directory
p34:~# cat /proc/sys/vm/overcommit_memory
0
p34:~# cat /proc/sys/vm/overcommit_ratio
50
p34:~#


Comments?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread Alan Cox
> > prctl(PR_SPARSEFD, 1);
> > 
> > to turn on sparse fd allocation for a process ?
> 
> There was a little discussion where I tried to whisper something similar, 
> but Linus and Uli shot me :) - with good reasons IMO.
> You may link to runtimes that are not non-sequentialfd aware, and will 
> break them.

Linking to the correct version of a libary and getting the library
versioning right is not rocket science and isn't a sane excuse. Its no
different to the stdio to large fd migration issues with many Unixen and
they all coped just fine.

Really all this new syscall hackery stuff is just too ugly to live. If
you use the prctl then yes we have a bit of library versioning to worry
about for the odd library that cares but thats a once over thing. The
crappy zillion extra syscalls we have to support for years and years just
to save a little bit of userspace work.

At its most moronic its no different to 32 and 64bit binary linking - and
the gnu tools manage to cope with stopping me linking a 32bit app to a
64bit lib and vice versa just fine, so I'm sure they can cope the same
way with sparse fd safe/non sparese fd safe libraries

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Device hang when offlining a CPU due to IRQ misrouting

2007-06-06 Thread Darrick J. Wong
On Wed, Jun 06, 2007 at 12:35:14PM -0700, Siddha, Suresh B wrote:

> Weird. Then the bug can only happen if for some reason, "mask = map"
> didn't happen in fixup_irqs(). Can you send us the disassembly of the
> fixup_irqs()?

Attached.

--D
(gdb) disassemble fixup_irqs
Dump of assembler code for function fixup_irqs:
0x8020bf50 :  push   %rbp
0x8020bf51 :  mov%rsp,%rbp
0x8020bf54 :  push   %r13
0x8020bf56 :  xor%r13d,%r13d
0x8020bf59 :  push   %r12
0x8020bf5b : push   %rbx
0x8020bf5c : sub$0x28,%rsp
0x8020bf60 : mov%rdi,0xffc0(%rbp)
0x8020bf64 : mov%rsi,0xffc8(%rbp)
0x8020bf68 : jmp0x8020bf73 

0x8020bf6a : inc%r13d
0x8020bf6d : cmp$0x2,%r13d
0x8020bf71 : je 0x8020bf6a 

0x8020bf73 : mov%r13d,%r12d
0x8020bf76 : lea0xffd0(%rbp),%rbx
0x8020bf7a : lea0xffc0(%rbp),%rdx
0x8020bf7e : shl$0x8,%r12
0x8020bf82 : mov$0x80,%ecx
0x8020bf87 : lea0x805505f8(%r12),%rsi
0x8020bf8f : mov%rbx,%rdi
0x8020bf92 : callq  0x802fb606 <__bitmap_and>
0x8020bf97 : mov%rbx,%rdi
0x8020bf9a : callq  0x802fc6ad 
<__any_online_cpu>
0x8020bf9f : add$0xff80,%eax
0x8020bfa2 : jne0x8020bfc5 

0x8020bfa4 : mov%r13d,%esi
0x8020bfa7 : mov$0x804a52b0,%rdi
0x8020bfae : xor%eax,%eax
0x8020bfb0 : callq  0x80233d28 
0x8020bfb5 :mov0xffc0(%rbp),%rax
0x8020bfb9 :mov%rax,0xffd0(%rbp)
0x8020bfbd :mov0xffc8(%rbp),%rax
0x8020bfc1 :mov%rax,0xffd8(%rbp)
0x8020bfc5 :mov0x80550588(%r12),%rax
0x8020bfcd :mov0x58(%rax),%rax
0x8020bfd1 :test   %rax,%rax
0x8020bfd4 :je 0x8020bfe5 

0x8020bfd6 :mov0xffd0(%rbp),%rsi
0x8020bfda :mov0xffd8(%rbp),%rdx
0x8020bfde :mov%r13d,%edi
0x8020bfe1 :callq  *%rax
0x8020bfe3 :jmp0x8020c013 

0x8020bfe5 :cmpq   $0x0,0x805505a8(%r12)
0x8020bfee :je 0x8020c013 

0x8020bff0 :mov5181486(%rip),%eax# 
0x806fd024 
0x8020bff6 :inc%eax
0x8020bff8 :mov%eax,5181478(%rip)# 
0x806fd024 
0x8020bffe :dec%eax
0x8020c000 :jne0x8020c013 

0x8020c002 :mov%r13d,%esi
0x8020c005 :mov$0x804a52ce,%rdi
0x8020c00c :xor%eax,%eax
0x8020c00e :callq  0x80233d28 
0x8020c013 :lea0x1(%r13),%eax
0x8020c017 :cmp$0x10ff,%eax
0x8020c01c :jbe0x8020bf6a 

0x8020c022 :callq  0x8024e46e 

0x8020c027 :sti
0x8020c028 :mov$0x418958,%edi
0x8020c02d :callq  0x803018cf 
<__const_udelay>
0x8020c032 :cli
0x8020c033 :callq  0x8024cf31 

0x8020c038 :add$0x28,%rsp
0x8020c03c :pop%rbx
0x8020c03d :pop%r12
0x8020c03f :pop%r13
0x8020c041 :leaveq 
0x8020c042 :retq   


signature.asc
Description: Digital signature


Re: [BUG] Fwd: segfault : modprobe dccp_probe/tcp_probe

2007-06-06 Thread Ian McDonald

On 6/7/07, Chuck Ebbert <[EMAIL PROTECTED]> wrote:

On 06/06/2007 04:47 PM, Ian McDonald wrote:
> Hi there,
>
> We've seen a report of a problem with dccp_probe as shown below. The
> user has also verified that it occurs in tcp_probe as well. This is on
> Dave Miller's tree but that currently tracks Linus' tree quite
> closely. I do note that it is around 2.6.22-rc2 timeframe so there is
> a possibility fixes may have gone in since.
>

It faulted when it tried to write the breakpoint instruction into the
running kernel's executable code. Apparently the kernel code is now marked
read-only?



Yes it would appear to be the case as user has CONFIG_DEBUG_RODATA
set. Patrick - can you turn this off and retest? It's under Kernel
Hacking, Write protect kernel read only data structures.

The list of commits that I see around this are at:
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git=search=HEAD=commit=DEBUG_RODATA

I suspect it's probably one of the latter ones giving the timing.

I guess there are a couple of solutions here - either make kprobes
conflict with CONFIG_DEBUG_RODATA so you can do one or the other, or
look into more detail what access kprobes need.

Ian
--
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch 4/4] lock contention tracking slimmed down

2007-06-06 Thread Ingo Molnar

* Martin Peschke <[EMAIL PROTECTED]> wrote:

> - lock_time_inc() vs. statistic_add_util()

please fix the coding style in lib/statistic.c. It's full of:

{
unsigned long long i;
if (value <= stat->u.histogram.range_min)
return 0;

put a newline after variable sections.

and:

on_each_cpu(_statistic_barrier, NULL, 0, 1);
return 0;

preferably use a newline before 'return' statements as well. (this is 
not always necessary, but in the above case it looks better)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trim memory not covered by WB MTRRs

2007-06-06 Thread Randy Dunlap
On Wed, 6 Jun 2007 18:54:37 -0400 (EDT) Justin Piszcz wrote:

> Hm, not sure if it was from the patch or what but I ran this:
> 
> 1. swapoff -a
> 2. ./eatmem
> 
> The machine responded to ping and alt-sysrq-b but the box remain 
> unresponsive, I guess the kernel did not kill the process? :(
> 
> The moments before it 'froze'
> 
> top - 18:48:01 up 15 min,  7 users,  load average: 6.61, 18.50, 13.31
> Tasks: 200 total,  18 running, 182 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us, 90.7%sy,  0.0%ni,  5.9%id,  3.3%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Mem:   8039576k total,  7998860k used,40716k free,8k buffers
> Swap:0k total,0k used,0k free, 1664k cached
> 
>PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>248 root  11  -5 000 R   85  0.0   0:16.05 kswapd0
>   2265 nut   18   0 13320  2444 R   40  0.0   0:03.13 newhidups
>   2267 nut   18   0 12216  1684 R   40  0.0   0:02.04 upsd
>   2474 ntp   18   0 22192  4008 R   39  0.0   0:02.00 ntpd
>   3563 jpiszcz   18   0 41964 12644 R   38  0.0   0:02.20 pine
>   3530 root  18   0 96240 3132   36 R   37  0.0   0:02.09 kdm_greet
>   2052 root  18   0  6080  1124 R   37  0.0   0:02.00 hald-addon-stor
>   4479 war   17   0 18012  700  252 R   33  0.0   0:01.81 top
>   4480 war   19   0 6948m 6.8g4 R   22 88.4   0:05.81 eatmem
>   2095 root  18   0 13128  2168 R   10  0.0   0:00.50 dirmngr
>   2545 root  18   0 95788 24884 R5  0.0   0:00.25 apache2
>   3564 war   18   0 41620  8324 R5  0.0   0:00.34 pine
>   2270 nut   15   0 12212  1444 S1  0.0   0:00.05 upsmon
>561 root  10  -5 000 S0  0.0   0:00.02 xfsbufd
> 
> Very simply program:
> 
> #include 
> using namespace std;
> 
> int main()
> {
>long int interations = 1000;
>int counter = 1;
> 
>for(counter;counter{
>   double *d = new double[100];

You usually have to access the allocated memory, like:

*d = 1.0;

for it to actually be allocated (AFAIK).

>}
> 
>return 0;
> }
> 
> Any idea why the OOM killer can or does not kill it?

What are the values of /proc/sys/vm/overcommit* ?

See Documentation/vm/overcommit-accounting .

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread David Miller
From: Davide Libenzi <[EMAIL PROTECTED]>
Date: Wed, 6 Jun 2007 16:04:40 -0700 (PDT)

> On Wed, 6 Jun 2007, Alan Cox wrote:
> 
> > > The sys_accept() system call has been modified to return a file
> > > descriptor inside the non-sequential area, if the listening fd is.
> > > The sys_socketcall() system call has been also changed to support
> > > a new SYS_SOCKET2 indentifier.
> > 
> > This still all seems really really ugly. Is there anything wrong with
> > throwing all these extra cases out and replacing the entire lot with
> > 
> > prctl(PR_SPARSEFD, 1);
> > 
> > to turn on sparse fd allocation for a process ?
> 
> There was a little discussion where I tried to whisper something similar, 
> but Linus and Uli shot me :) - with good reasons IMO.
> You may link to runtimes that are not non-sequentialfd aware, and will 
> break them.

Thanks for explaining this issue clearly instead of telling people
to "go read the archives" in a condescending manner like someone
else did.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Big problems applying patch-2.6.21.3-rt9

2007-06-06 Thread Paul Mundt
On Wed, Jun 06, 2007 at 03:47:00PM -0700, Tim Bird wrote:
> Ingo,
> 
> I saw lots of problems trying to apply the latest rt-preempt patch.
> Maybe some bits got included by mistake in the patch?
> 
> Here's what I saw:
> $ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.21.3.tar.bz2
> $ wget http://people.redhat.com/mingo/realtime-preempt/patch-2.6.21.3-rt9
> $ tar -xjf linux-2.6.21.3.tar.bz2
> $ cd linux-2.6.21.3
> $ patch -p1 -f <../patch-2.6.21.3-rt9
> ... lots of errors, (many "already applied" errors if you don't use '-f')...

Use 2.6.21 proper. -rt9 includes the 2.6.21.3 changes, hence the
application failure. This was brought up a few days ago.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

2007-06-06 Thread Paul Mackerras
Linus Torvalds writes:

> So I think that the *right* place to clear TIF_SIGPENDING is actually in 
> "get_signal_to_deliver()", because that function is called _only_ by the 
> actual per-architecture "I'm going to deliver a signal now".

I agree that's the right place for real user processes, but I note
that there are drivers that have kernel threads that do basically
this:

if (signal_pending(current))
dequeue_signal(current, ...);

for example, drivers/block/nbd.c, and obviously they don't want to
still see signal_pending(current) after they have dequeued all the
pending signals.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread Davide Libenzi
On Wed, 6 Jun 2007, Alan Cox wrote:

> > The sys_accept() system call has been modified to return a file
> > descriptor inside the non-sequential area, if the listening fd is.
> > The sys_socketcall() system call has been also changed to support
> > a new SYS_SOCKET2 indentifier.
> 
> This still all seems really really ugly. Is there anything wrong with
> throwing all these extra cases out and replacing the entire lot with
> 
>   prctl(PR_SPARSEFD, 1);
> 
> to turn on sparse fd allocation for a process ?

There was a little discussion where I tried to whisper something similar, 
but Linus and Uli shot me :) - with good reasons IMO.
You may link to runtimes that are not non-sequentialfd aware, and will 
break them.



> Anyone needing to deal with certain special fds will use dup2() anyway so
> a task global switch seems to be cleaner and make the behaviour simply to
> flip on, with no extra calls (and you need to submit man pages for them
> all too), and also more importantly no new glibc stuff should be needed,
> and a process can try to set sparsefd, fail and carry on so its more
> portable and back portable.

Man pages! Damn, I forgot Michael Kerrisk is already waiting for the other 
stuff :(



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [Patch 4/4] lock contention tracking slimmed down

2007-06-06 Thread Ingo Molnar

* Martin Peschke <[EMAIL PROTECTED]> wrote:

> The output has changed from a terribly wide table to an enormously 
> long list (just the generic way the statistics code prints data). 

Sigh, why dont you _ask_ before doing such stuff? It is a terribly wide 
table because that makes it easily greppable but still watchable in one 
chunk in a sufficiently zoomed out xterm. Please preserve this output 
format, quite some work went into it - NACK :-(

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread David Miller
From: Ulrich Drepper <[EMAIL PROTECTED]>
Date: Wed, 06 Jun 2007 15:57:41 -0700

> I would strongly argue that any change we're doing in this area at
> userlevel would involve a new interface.  Programs also need new
> definitions from headers files.  This means a recent enough glibc will
> be needed in any case.  Unless programs use their own definitions in
> which case they might as well use the syscall() function.

To be honest, after reading Alan's response a few moments ago
I'm growing in favor of his suggestions and that all of these
new system calls perhaps really are overkill.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Big problems applying patch-2.6.21.3-rt9

2007-06-06 Thread Tim Bird
Thomas Gleixner wrote:
> On Wed, 2007-06-06 at 15:47 -0700, Tim Bird wrote:
>> Ingo,
>>
>> I saw lots of problems trying to apply the latest rt-preempt patch.
>> Maybe some bits got included by mistake in the patch?
>>
>> Here's what I saw:
>> $ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.21.3.tar.bz2
> 
> That should be 2.6.21.tar.bz2
> 
> The patch contains the .3 patch already. Unfortunately it used the wrong
> numbering scheme :(

I should have thought of that.  I noticed older patches were
against 2.6.21.

Thanks!
 -- Tim

=
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-06 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Alan Cox wrote:
>   prctl(PR_SPARSEFD, 1);
> 
> to turn on sparse fd allocation for a process ?

Yes, there is.  Go back and read the archives.  It has been discussed in
depth.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFGZzwk2ijCOnn/RHQRAtKGAKCTX5njQnYeyDn4XUGFAZ3Ojai+mwCeN/j0
jibBDSqQpXhR2CwIQNRAnXw=
=sJb0
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Big problems applying patch-2.6.21.3-rt9

2007-06-06 Thread Thomas Gleixner
On Wed, 2007-06-06 at 15:47 -0700, Tim Bird wrote:
> Ingo,
> 
> I saw lots of problems trying to apply the latest rt-preempt patch.
> Maybe some bits got included by mistake in the patch?
> 
> Here's what I saw:
> $ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.21.3.tar.bz2

That should be 2.6.21.tar.bz2

The patch contains the .3 patch already. Unfortunately it used the wrong
numbering scheme :(

Sorry,

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >