Re: Real-Time Preemption and RCU

2005-03-17 Thread Ingo Molnar

* Paul E. McKenney <[EMAIL PROTECTED]> wrote:

> Seems to me that it would be good to have an RCU implementation that
> meet the requirements of the Real-Time Preemption patch, but that is
> 100% compatible with the "classic RCU" API.  Such an implementation
> must meet a number of requirements, which are listed at the end of
> this message (search for "REQUIREMENTS").

[ Wow. you must be a secret telepatic mind-reader - yesterday i was
  thinking about mailing you, because my approach to RCU preemptability
  (the API variants) clearly sucked and caused problems all around, both
  in terms of maintainability and in terms of stability and
  scalability. ]

> 5.The final version, which both scales and meets realtime
>   requirements, as well as exactly matching the "classic RCU"
>   API.
> 
> I have tested this approach, but in user-level scaffolding.  All of
> these implementations should therefore be regarded with great
> suspicion: untested, probably don't even compile.  Besides which, I
> certainly can't claim to fully understand the real-time preempt patch,
> so I am bound to have gotten something wrong somewhere.  In any case,
> none of these implementations are a suitable replacement for "classic
> RCU" on large servers, since they acquire locks in the RCU read-side
> critical sections. However, they should scale enough to support small
> SMP systems, inflicting only a modest performance penalty.

basically for PREEMPT_RT the only constraint is that RCU sections should
be preemptable. Whatever the performance cost. If PREEMPT_RT is merged
into the upstream kernel then it will (at least initially) be at a
status similar to NOMMU: it will be tolerated as long as it causes no
'drag' on the main code. The RCU API variants i introduced clearly
violated this requirement, and were my #1 worry wrt. upstream
mergability.

> I believe that implementation #5 is most appropriate for real-time
> preempt kernels. [...]

yeah, agreed - it looks perfect - both the read and write side is
preemptable. Can i just plug the code you sent into rcupudate.c and
expect it to work, or would you like to send a patch? If you prefer you
can make it an unconditional patch against an upstream kernel to keep
things simple for you - i'll then massage it to be properly PREEMPT_RT
dependent.

> [...] In theory, #3 might be appropriate, but if I understand the
> real-time preempt implementation of reader-writer lock, it will not
> perform well if there are long RCU read-side critical sections, even
> in UP kernels.

all RCU-locked sections must be preemptable in -RT.  Basically RCU is a
mainstream API that is used by lots of code and will be introduced in
many other areas as well. From the -RT kernel's POV sees this as an
'uncontrollable latency source', which keeps introducing critical
sections. One major goal of PREEMPT_RT is to convert all popular
critical section APIs into preemptible sections, so that the amount of
code that is non-preemptable is drastically reduced and can be managed
(and thus can be trusted). This goal has a higher priority than any
performance consideration, because it doesnt matter what performance you
have, if you cannot trust the kernel to be deterministic.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks

2005-03-17 Thread Steven Rostedt


On Thu, 17 Mar 2005, Lee Revell wrote:
>
> OK, no need to cc: me on this one any more.  It's really low priority
> IMO compared to the big latencies I am seeing with ext3 and
> "data=ordered".  Unless you think there is any relation.
>

IMO a deadlock is higher priority than a big latency :-)

I still belive that something to do with the locking in ext3 has to do
with your latencies, but I'll take you off when I send something to Andrew
or Ingo next time. Hopefully, they'll do the same.

When this problem is solved on Ingo's side, maybe this will solve your
latency problem, so I recommend that you keep trying the latest RT
kernels.  BTW what test are you running that causes these latencies?

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ppc64 build broke between 2.6.11-bk6 and 2.6.11-bk7

2005-03-17 Thread Martin J. Bligh


--Andrew Morton <[EMAIL PROTECTED]> wrote (on Thursday, March 17, 2005 22:44:09 
-0800):

> "Martin J. Bligh" <[EMAIL PROTECTED]> wrote:
>> 
>> drivers/built-in.o(.text+0x182bc): In function `.matroxfb_probe':
>> : undefined reference to `.mac_vmode_to_var'
>> make: *** [.tmp_vmlinux1] Error 1
>> 
>> Anyone know what that is?
>> 
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm4/broken-out/fbdev-kconfig-fix-for-macmodes-and-ppc.patch
> 
> should fix it.
> 
> 

Thanks - will retest.

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ppc64 build broke between 2.6.11-bk6 and 2.6.11-bk7

2005-03-17 Thread Andrew Morton
"Martin J. Bligh" <[EMAIL PROTECTED]> wrote:
>
> drivers/built-in.o(.text+0x182bc): In function `.matroxfb_probe':
> : undefined reference to `.mac_vmode_to_var'
> make: *** [.tmp_vmlinux1] Error 1
> 
> Anyone know what that is?
> 

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm4/broken-out/fbdev-kconfig-fix-for-macmodes-and-ppc.patch

should fix it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BKCVS broken ?

2005-03-17 Thread Larry McVoy
On Thu, Mar 17, 2005 at 10:50:40PM -0700, Erik Andersen wrote:
> On Thu Mar 17, 2005 at 04:10:53PM -0800, Larry McVoy wrote:
> > I got swamped, I'll look at this after dinner.  But you might take a look
> > at this: http://www.bitkeeper.com/press/2005-03-17.html which is a link
> > to a very simple open source BK client.  It doesn't do much except track
> > the head of the tree but it does that well.  It's slightly better than
> > that, it puts all the checkin comments in BK/ChangeLog so you don't have
> > to go over the wire to get those.
> > 
> > It's intended for someone who just wants the latest and greatest snapshot,
> > knows how to do cp -rp and diff -Nur, it's pretty basic.  It's not a
> > CVS gateway replacement but it does work for every tree on bkbits.net.
> > Just to be clear, we are not dropping the CVS gateway, this is "in
> > addition to" not "instead of".
> 
> Thanks!  Its nice to finally have an open source tool for sucking
> down the latest and greatest directly from bk.  Thus far the tool
> is working perfectly at fetching source trees and at updating
> them when new patches are applied.

Great.  It _should_ just work, I tested it with patches that included
binaries which changed, it handles that.  I suspect we'll find some
case which doesn't work some day (symlinks can't be represented in 
a patch for example) but you can always reget things from scratch,
that will work for contents, permissions, symlinks, the works.

> One minor nit.  The name for the 'update' tool is a bit too
> generic...  

Hey, it's open source, I'm hoping that people will take that code and
evolve it do whatever they need.  We're willing to do what we can on
this end if people need protocol changes to support new features, 
time permitting.  Think of that code as a prototype.  It's really
simple, you can hack it trivially.

If you want us to distribute your changes then send a patch, if not
that's cool too.  You can take that and evolve it to your heart's
content.  If you need a different license to start hacking let me
know what you want, I really don't care, you can have that code 
as public domain if you like.
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ppc64 build broke between 2.6.11-bk6 and 2.6.11-bk7

2005-03-17 Thread Martin J. Bligh
drivers/built-in.o(.text+0x182bc): In function `.matroxfb_probe':
: undefined reference to `.mac_vmode_to_var'
make: *** [.tmp_vmlinux1] Error 1

Anyone know what that is?

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BKPATCH] ACPI for 2.6.12-rc1

2005-03-17 Thread Len Brown
Hi Linus, please do a 

bk pull bk://linux-acpi.bkbits.net/to-linus

This includes the ACPI part of memory hotplug,
plus various fixes, BIOS workarounds and a fix for
an interpreter regressions we had in 2.6.11 vs 2.6.10.

All changes here have been through Andrew's mm tree.

thanks,
-Len

ps. a plain patch is also available here:
ftp://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/2.6.11/acpi-20050309-2.6.11.diff.gz

This will update the following files:

 arch/i386/kernel/acpi/sleep.c   |3 
 arch/ia64/kernel/acpi.c |2 
 drivers/acpi/Kconfig|   20 
 drivers/acpi/Makefile   |1 
 drivers/acpi/ac.c   |   18 
 drivers/acpi/acpi_memhotplug.c  |  542 
 drivers/acpi/battery.c  |2 
 drivers/acpi/button.c   |4 
 drivers/acpi/container.c|   15 
 drivers/acpi/debug.c|4 
 drivers/acpi/dispatcher/dsmethod.c  |   11 
 drivers/acpi/dispatcher/dsopcode.c  |8 
 drivers/acpi/dispatcher/dsutils.c   |  166 +--
 drivers/acpi/dispatcher/dswexec.c   |   61 ++
 drivers/acpi/ec.c   |2 
 drivers/acpi/events/evxface.c   |4 
 drivers/acpi/executer/exmisc.c  |5 
 drivers/acpi/executer/exoparg2.c|6 
 drivers/acpi/executer/exresolv.c|6 
 drivers/acpi/executer/exstoren.c|7 
 drivers/acpi/executer/exstorob.c|   27 -
 drivers/acpi/fan.c  |   33 -
 drivers/acpi/ibm_acpi.c |4 
 drivers/acpi/numa.c |2 
 drivers/acpi/osl.c  |   10 
 drivers/acpi/parser/psopcode.c  |2 
 drivers/acpi/parser/psparse.c   |   42 +
 drivers/acpi/parser/pswalk.c|  254 +--
 drivers/acpi/pci_irq.c  |   38 +
 drivers/acpi/pci_link.c |   14 
 drivers/acpi/pci_root.c |4 
 drivers/acpi/power.c|   10 
 drivers/acpi/processor_core.c   |6 
 drivers/acpi/processor_thermal.c|2 
 drivers/acpi/processor_throttling.c |2 
 drivers/acpi/resources/rsaddr.c |  146 +++---
 drivers/acpi/resources/rscalc.c |   14 
 drivers/acpi/resources/rsdump.c |   23 -
 drivers/acpi/resources/rslist.c |1 
 drivers/acpi/scan.c |   47 +-
 drivers/acpi/thermal.c  |2 
 drivers/acpi/toshiba_acpi.c |2 
 drivers/acpi/utilities/utcopy.c |   19 
 drivers/acpi/utilities/utdelete.c   |   18 
 drivers/acpi/utilities/utglobal.c   |   10 
 drivers/acpi/utilities/utmisc.c |   44 +
 drivers/acpi/video.c|2 
 drivers/pnp/pnpacpi/rsparser.c  |9 
 include/acpi/acconfig.h |4 
 include/acpi/acdisasm.h |5 
 include/acpi/acdispat.h |   10 
 include/acpi/acinterp.h |1 
 include/acpi/aclocal.h  |4 
 include/acpi/acpi_bus.h |1 
 include/acpi/acpi_drivers.h |3 
 include/acpi/acstruct.h |1 
 include/acpi/actbl.h|4 
 include/acpi/actbl2.h   |   79 +++
 include/acpi/actypes.h  |   33 -
 include/acpi/platform/acenv.h   |2 
 include/acpi/processor.h|2 
 include/linux/acpi.h|2 
 62 files changed, 1301 insertions(+), 524 deletions(-)

through these ChangeSets:

<[EMAIL PROTECTED]> (05/03/17 1.2213)
   [ACPI] build fix in acpi_pci_irq_disable()
   
   bk-acpi-acpi_pci_irq_disable-build-fix.patch
   
   Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
   Signed-off-by: Len Brown <[EMAIL PROTECTED]>

<[EMAIL PROTECTED]> (05/03/09 1.1938.505.27)
   [ACPI] ACPICA 20050309 from Bob Moore
   
   The string-to-buffer implicit conversion code has been
   modified again after a change to the ACPI specification.
   In order to match the behavior of the other major ACPI
   implementation, the target buffer is no longer truncated
   if the source string is smaller than an existing target
   buffer. This change requires an update to the ACPI spec,
   and should eliminate the recent AE_AML_BUFFER_LIMIT issues.
   
   The "implicit return" support was rewritten to a new
   algorithm that solves the general case. Rather than
   attempt to determine when a method is about to exit,
   the result of every ASL operator is saved momentarily
   until the very next ASL operator is executed. Therefore,
   no matter how the method exits, there will always be a
   saved implicit return value.  This feature is only enabled
   with the acpi_gbl_enable_interpreter_slack flag which
   Linux enables unless "acpi=strict".  This should
   eliminate AE_AML_NO_RETURN_VALUE errors.
   
   Implemented implicit conversion support for the predicate
   (operand) of the If, Else, and While operators. String and
   Buffer arguments are automatically converted to Integers.
   
   Changed the string-to-integer 

Re: [PATCH] Automatically append a semi-random version for BK users

2005-03-17 Thread Ryan Anderson
Sam, this version includes the CVS portion.

Automatically append a semi-random version if the tree we're building
isn't tagged in BitKeeper or CVS  and CONFIG_LOCALVERSION_AUTO is set.
 
This fixes the case when Linus (or someone else) does a release and tags
it, someone else does a build of that release tree (i.e, 2.6.11), and
installs it.  Later, before another release occurs (i.e, -rc1), another
build happens, and the actual, released 2.6.11 is overwritten with the
-current tree.
 
This currently supports BitKeeper and CVS (assuming the format is the
same as the BK->CVS tree)
 
Signed-Off-By: Ryan Anderson <[EMAIL PROTECTED]>


Index: local-quilt/Makefile
===
--- local-quilt.orig/Makefile   2005-03-14 20:53:59.0 -0500
+++ local-quilt/Makefile2005-03-14 20:54:02.0 -0500
@@ -549,6 +549,26 @@ export KBUILD_IMAGE ?= vmlinux
 # images. Default is /boot, but you can set it to other values
 export INSTALL_PATH ?= /boot
 
+# If CONFIG_LOCALVERSION_AUTO is set, we automatically perform some tests
+# and try to determine if the current source tree is a release tree, of any 
sort,
+# or if is a pure development tree.
+#
+# A 'release tree' is any tree with a BitKeeper, or other SCM, TAG associated
+# with it.  The primary goal of this is to make it safe for a native
+# BitKeeper/CVS/SVN user to build a release tree (i.e, 2.6.9) and also to
+# continue developing against the current Linus tree, without having the Linus
+# tree overwrite the 2.6.9 tree when installed.
+#
+# Currently, only BitKeeper is supported.
+# Other SCMs can edit scripts/setlocalversion and add the appropriate
+# checks as needed.
+
+
+ifdef CONFIG_LOCALVERSION_AUTO
+   localversion-auto := $(shell $(PERL) $(srctree)/scripts/setlocalversion 
$(srctree))
+   LOCALVERSION := $(LOCALVERSION)$(localversion-auto)
+endif
+
 #
 # INSTALL_MOD_PATH specifies a prefix to MODLIB for module directory
 # relocations required by build roots.  This is not defined in the
Index: local-quilt/init/Kconfig
===
--- local-quilt.orig/init/Kconfig   2005-03-14 20:53:59.0 -0500
+++ local-quilt/init/Kconfig2005-03-17 23:49:44.0 -0500
@@ -69,6 +69,24 @@ config LOCALVERSION
  object and source tree, in that order.  Your total string can
  be a maximum of 64 characters.
 
+config LOCALVERSION_AUTO
+   bool "Automatically append version information to the version string"
+   default y
+   help
+ This will try to automatically determine if the current tree is a
+ release tree by looking for BitKeeper or CVS tags that
+ belong to the current top of tree revision.
+
+ A string of the format -BK will be added to the localversion
+ if a BitKeeper based tree is found.  The string -cvs-$version will be
+ added to the localversion if a CVS tree based on the BK->CVS tree is
+ found.  The string generated by this will be appended after any
+ matching localversion* files, and after the value set in
+ CONFIG_LOCALVERSION
+
+ Note: This requires Perl and the Digest::MD5 module, as well
+ as BitKeeper and/or CVS.
+
 config SWAP
bool "Support for paging of anonymous memory (swap)"
depends on MMU
Index: local-quilt/scripts/setlocalversion
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ local-quilt/scripts/setlocalversion 2005-03-17 23:02:02.0 -0500
@@ -0,0 +1,120 @@
+#!/usr/bin/perl
+# Copyright 2004 - Ryan Anderson <[EMAIL PROTECTED]>  GPL v2
+
+use strict;
+use warnings;
+use Digest::MD5;
+require 5.006;
+
+if (@ARGV != 1) {
+   print <
+EOT
+   exit(1);
+}
+
+my ($srctree) = @ARGV;
+
+my @LOCALVERSIONS = ();
+
+# BitKeeper Version Checks
+
+# We are going to use the following commands to try and determine if this
+# repository is at a Version boundary (i.e, 2.6.10 vs 2.6.10 + some patches) We
+# currently assume that all meaningful version boundaries are marked by a tag.
+# We don't care what the tag is, just that something exists.
+#
+# The process is as follows:
+#
+# 1. Get the key of the top of tree changeset:
+#  cset=`bk changes -r+ -k`
+#This will be something like:
+#[EMAIL PROTECTED]|ChangeSet|20050314010036|43252
+#
+# 2. Get the tag, if any, associated with it:
+#   bk prs -h -d':TAG:\n' -r$cset
+#
+# 3. If no such tag exists, take the hex-encoded md5sum of the
+# changeset key, extract the first 8 characters of it, and add
+# -BK and the above 8 characters to the end of the version.
+
+sub do_bk_checks {
+   chdir($srctree);
+   my $changeset = `bk changes -r+ -k`;
+   chomp $changeset; # strip trailing \n safely
+   my $tag = `bk prs -h -d':TAG:' -r'$changeset'`;
+
+   if (length($tag) == 0) {
+   # There is no tag 

Re: BKCVS broken ?

2005-03-17 Thread Erik Andersen
On Thu Mar 17, 2005 at 04:10:53PM -0800, Larry McVoy wrote:
> I got swamped, I'll look at this after dinner.  But you might take a look
> at this: http://www.bitkeeper.com/press/2005-03-17.html which is a link
> to a very simple open source BK client.  It doesn't do much except track
> the head of the tree but it does that well.  It's slightly better than
> that, it puts all the checkin comments in BK/ChangeLog so you don't have
> to go over the wire to get those.
> 
> It's intended for someone who just wants the latest and greatest snapshot,
> knows how to do cp -rp and diff -Nur, it's pretty basic.  It's not a
> CVS gateway replacement but it does work for every tree on bkbits.net.
> Just to be clear, we are not dropping the CVS gateway, this is "in
> addition to" not "instead of".

Thanks!  Its nice to finally have an open source tool for sucking
down the latest and greatest directly from bk.  Thus far the tool
is working perfectly at fetching source trees and at updating
them when new patches are applied.

One minor nit.  The name for the 'update' tool is a bit too
generic...  For example old (old) linux systems have an
/sbin/update util for flushing buffers, and I have plenty of
'update' scripts lying around doing odd jobs.   Perhaps a rename
to 'sfioup' might be a good idea, as that is sufficiently obscure
there is little chance of a naming collision.

 -Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8 + free_hot_zeroed_page + free_cold_zeroed page

2005-03-17 Thread Nish Aravamudan
On Thu, 17 Mar 2005 18:09:11 -0800 (PST), Christoph Lameter
<[EMAIL PROTECTED]> wrote:
> On Thu, 17 Mar 2005, Jason Uhlenkott wrote:
> 
> > On Thu, Mar 17, 2005 at 05:36:50PM -0800, Christoph Lameter wrote:
> > > +while (avenrun[0] >= ((unsigned long)sysctl_scrub_load << 
> > > FSHIFT)) {
> > > +   set_current_state(TASK_UNINTERRUPTIBLE);
> > > +   schedule_timeout(30*HZ);
> > > +   }
> >
> > This should probably be TASK_INTERRUPTIBLE.  It'll never actually get
> > interrupted either way since kernel threads block all signals, but
> > sleeping uninterruptibly contributes to the load average.
> 
> Correct.  I just do not seem to be able to get this right.

I think msleep_interruptible(3) would be your best choice, then. 
Maybe with a comment that you don't actually expect signals, but are
using TASK_INTERRUPTIBLE to avoid contributing to load average (that
way, if the loadavg calculation changes someday, somebody will be able
to change your sleep over appropriately).

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Network Driver Name (ATM Driver)

2005-03-17 Thread Subbu

Hi

I have a network driver (Ethernet), When i install the driver with insmod
the driver installs successfully
and with ifconfig -a i could see the ethernet driver name as eth0 or eth1
etc.
(net_device structure having variable name from where we will have
the name of the driver)


For ATM driver , it installs the driver without any problem.
but with ifconfig -a it doesn't show anything like the case in ethernet
driver.(i suppose it shows like atm0, atm1 or .. am i correct.??)B

How i can i get the driver name with "ifconfig -a" for PPPoATM driver.

Whats the function need to be included in the code to get the same..??
Kindly reply back to this mail ID and [EMAIL PROTECTED]



Thanks in Advance
Subbu


"SASKEN RATED THE BEST EMPLOYER IN THE COUNTRY by the BUSINESS TODAY Mercer 
Survey 2004"


   SASKEN BUSINESS DISCLAIMER
This message may contain confidential, proprietary or legally Privileged 
information. In case you are not the original intended Recipient of the 
message, you must not, directly or indirectly, use, Disclose, distribute, 
print, or copy any part of this message and you are requested to delete it and 
inform the sender. Any views expressed in this message are those of the 
individual sender unless otherwise stated. Nothing contained in this message 
shall be construed as an offer or acceptance of any offer by Sasken 
Communication Technologies Limited ("Sasken") unless sent with that express 
intent and with due authority of Sasken. Sasken has taken enough precautions to 
prevent the spread of viruses. However the company accepts no liability for any 
damage caused by any virus transmitted by this email
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 12/12] scripts/mod/sumversion.c: replace strtok() with strsep()

2005-03-17 Thread Sam Ravnborg
On Fri, Mar 18, 2005 at 03:46:20AM +0100, Nicolas Kaiser wrote:
> * Sam Ravnborg <[EMAIL PROTECTED]>:
> 
> > On Sat, Mar 05, 2005 at 04:35:45PM +0100, [EMAIL PROTECTED] wrote:
> > > 
> > > Replaces strtok() with strsep()
> > 
> > Why - does it increase portability?
> 
>  "strtok() is not thread and SMP safe and strsep() should be
> used instead"
> 
> http://janitor.kernelnewbies.org/docs/driver-howto.html#3.3.1

It does not matter in this particular file.
But applied for consistency (so it does not show up if you grep for it).

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 32Bit vs 64Bit

2005-03-17 Thread Bodo Eggert
regatta <[EMAIL PROTECTED]> wrote:

> My question because We ran a 32 Bit application in Sun AMD64 Optreon
> with 1GB connection (Kernel 2.4 x86_64 with 8 Gb memory & 2 CPUs)  and
> we had trouble time with it because the user tried to put the
> application processing data in a nas box (in the network) and that
> made the machine to use more than 60% of the NAS CPU and no one else
> was able to access the NAS

Does the application happen to frequently access the data in small chunks
randomly scatterd across the file(s)?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Thinkpad Suspend Powersave: Add D2 power saving code for Thinkpads with Radeon video chipsets

2005-03-17 Thread Benjamin Herrenschmidt
On Thu, 2005-03-17 at 22:39 -0500, Theodore Ts'o wrote:
> On Thu, Mar 17, 2005 at 10:19:04AM +1100, Benjamin Herrenschmidt wrote:
> > You probably want to remove the bit that does
> > 
> > OUTREG(TV_DAC_CNTL, INREG(TV_DAC_CNTL) | 0x0700);
> > 
> > Or you'll lose TV output :)
> 
> I'm not using TV output, and the original patch stated:
> 
> > > + /* Power down TV DAC, that saves a significant amount of power,
> > > +  * we'll have something better once we actually have some TVOut
> > > +  * support
> > > +  */

Yup, I know, I wrote this bit :)

> I suppose I should renable the TV DAC and see how much power it
> actually consumes if I enable it.  It would seem to me that we should
> have a way that we can power down whatever parts of the video chipset
> that we're not using.  (For example if I don't have anything connected
> to the VGA output, it would be good if we could power that down too...)

We can power down the internal DAC too, yes, and the TMDS transmitter
when no DVI is plugged, etc.. and we can also lower the chip clock :) I
do intend to do these things. The problem right now is
that the above will break some users who have a BIOS that can set
TV-Out. Maybe some sysfs attribute ? At least until I can properly
probe all ports including the TV Out (I'm working on that). Ultimately,
the driver should be able to properly detect everything that is
connected.

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Xen/i386 cleanups - AGP bus/phys cleanups

2005-03-17 Thread Rik van Riel
On Fri, 18 Mar 2005, Paul Mackerras wrote:

> However, the idea of having phys_to_agp/agp_to_phys (or 
> virt_to_agp/agp_to_virt) sounds like it wouldn't be too much effort, if 
> it would help Xen.

It would be absolutely trivial.  On most architectures you would have:

#define virt_to_agp  virt_to_phys
#define agp_to_virt  phys_to_virt

On Xen you would have:

#define virt_to_agp  virt_to_bus
#define agp_to_virt  bus_to_virt

Or, more likely, defined to arbitrary_machine_to_phys
or whatever it was called ;)

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE][PATCH] drivers/scsi/megaraid/megaraid_{mm,mbox}

2005-03-17 Thread Andrew Morton
"Ju, Seokmann" <[EMAIL PROTECTED]> wrote:
>
> Here, I'm sending another patch that has
>  fix for this issue.

It is still wordwrapped.

Please fix you email client, email the patch to yourself, ensure that the
result still applies, then resend it with a full description.

Thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] kobject/hotplug split - devices core

2005-03-17 Thread Kay Sievers
kobject_add() and kobject_del() don't emit hotplug events anymore. Do it
ourselves if we are finished populating the device directory.

Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>

--- 1.91/drivers/base/core.c2004-11-12 13:16:42 +01:00
+++ edited/drivers/base/core.c  2005-03-18 02:17:17 +01:00
@@ -260,6 +260,8 @@ int device_add(struct device *dev)
/* notify platform of device entry */
if (platform_notify)
platform_notify(dev);
+
+   kobject_hotplug(>kobj, KOBJ_ADD);
  Done:
put_device(dev);
return error;
@@ -349,6 +351,7 @@ void device_del(struct device * dev)
platform_notify_remove(dev);
bus_remove_device(dev);
device_pm_remove(dev);
+   kobject_hotplug(>kobj, KOBJ_REMOVE);
kobject_del(>kobj);
if (parent)
put_device(parent);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] kobject/hotplug split - block core

2005-03-17 Thread Kay Sievers
kobject_add() and kobject_del() don't emit hotplug events anymore. Do it
ourselves if we are finished populating the device directory.

Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>

= fs/partitions/check.c 1.129 vs edited =
--- 1.129/fs/partitions/check.c 2005-01-31 07:33:40 +01:00
+++ edited/fs/partitions/check.c2005-03-18 02:17:18 +01:00
@@ -337,6 +337,7 @@ void register_disk(struct gendisk *disk)
if ((err = kobject_add(>kobj)))
return;
disk_sysfs_symlinks(disk);
+   kobject_hotplug(>kobj, KOBJ_ADD);
 
/* No minors to use for partitions */
if (disk->minors == 1) {
@@ -441,5 +442,6 @@ void del_gendisk(struct gendisk *disk)
sysfs_remove_link(>driverfs_dev->kobj, "block");
put_device(disk->driverfs_dev);
}
+   kobject_hotplug(>kobj, KOBJ_REMOVE);
kobject_del(>kobj);
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Paul Mackerras
Nguyen, Tom L writes:

> We decided to implement PCI Express error handling based on the PCI
> Express specification in a platform independent manner.  This allows any
> platform that implements PCI Express AER per the PCI SIG specification
> can take advantage of the advanced features, much like SHPC hot-plug or
> PCI Express hot-plug implementations.

Does the PCI Express AER specification define an API for drivers?

> For PCI Express the endpoint device driver can take recovery action on
> its own, depending on the nature of the error so long as it does not
> affect the upstream device.  This can include endpoint device resets.

Likewise, with EEH the device driver could take recovery action on its
own.  But we don't want to end up with multiple sets of recovery code
in drivers, if possible.  Also we want the recovery code to be as
simple as possible, otherwise driver authors will get it wrong.

> To support the AER driver calling an upstream device to initiate a reset
> of the link we need a specific callback since the driver doing the reset
> is not the driver who got the error.  In the case of general PCI this

I would see the AER driver as being included in the "platform" code.
The AER driver would be be closely involved in the recovery process.

What is the state of a link during the time between when an error is
detected and when a link reset is done?  Is the link usable?  What
happens if you try to do a MMIO read from a device downstream of the
link?

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] kobject/hotplug split - usb cris

2005-03-17 Thread Kay Sievers
kobject_add() and kobject_del() don't emit hotplug events anymore.
We need to do it ourselves now.

Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>

= drivers/usb/host/hc_crisv10.c 1.7 vs edited =
--- 1.7/drivers/usb/host/hc_crisv10.c   2004-12-21 02:15:10 +01:00
+++ edited/drivers/usb/host/hc_crisv10.c2005-03-18 02:17:17 +01:00
@@ -4396,6 +4396,7 @@ static int __init etrax_usb_hc_init(void
 device_initialize(_device);
 kobject_set_name(_device.kobj, "etrax_usb");
 kobject_add(_device.kobj);
+kobject_hotplug(_device.kobj, KOBJ_ADD);
 hc->bus->controller = _device;
usb_register_bus(hc->bus);
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/6] kobject/hotplug split - class core

2005-03-17 Thread Kay Sievers
kobject_add() and kobject_del() don't emit hotplug events anymore. Do it
ourselves if we are finished populating the device directory.

Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>

= drivers/base/class.c 1.61 vs edited =
--- 1.61/drivers/base/class.c   2005-03-15 17:52:00 +01:00
+++ edited/drivers/base/class.c 2005-03-18 02:17:17 +01:00
@@ -491,6 +491,7 @@ int class_device_add(struct class_device
up(>sem);
}
 
+   kobject_hotplug(_dev->kobj, KOBJ_ADD);
  register_done:
if (error && parent)
class_put(parent);
@@ -562,6 +563,7 @@ void class_device_del(struct class_devic
}
class_device_remove_attrs(class_dev);
 
+   kobject_hotplug(_dev->kobj, KOBJ_REMOVE);
kobject_del(_dev->kobj);
 
if (parent)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] kobject/hotplug split - net bridge

2005-03-17 Thread Kay Sievers
kobject_add() and kobject_del() don't emit hotplug events anymore.
We need to do it ourselves now.

Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>

= net/bridge/br_sysfs_if.c 1.2 vs edited =
--- 1.2/net/bridge/br_sysfs_if.c2004-06-18 22:15:34 +02:00
+++ edited/net/bridge/br_sysfs_if.c 2005-03-18 02:17:18 +01:00
@@ -248,6 +248,7 @@ int br_sysfs_addif(struct net_bridge_por
if (err)
goto out2;
 
+   kobject_hotplug(>kobj, KOBJ_ADD);
return 0;
  out2:
kobject_del(>kobj);
@@ -259,6 +260,7 @@ void br_sysfs_removeif(struct net_bridge
 {
pr_debug("br_sysfs_removeif\n");
sysfs_remove_link(>br->ifobj, p->dev->name);
+   kobject_hotplug(>kobj, KOBJ_REMOVE);
kobject_del(>kobj);
 }
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] kobject/hotplug split - kobject add/remove

2005-03-17 Thread Kay Sievers
kobject_add() and kobject_del() don't emit hotplug events anymore.
The user should do it itself if it has finished populating the device
directory.

Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>

= lib/kobject.c 1.58 vs edited =
--- 1.58/lib/kobject.c  2005-03-09 18:04:09 +01:00
+++ edited/lib/kobject.c2005-03-18 02:17:18 +01:00
@@ -184,8 +184,6 @@ int kobject_add(struct kobject * kobj)
unlink(kobj);
if (parent)
kobject_put(parent);
-   } else {
-   kobject_hotplug(kobj, KOBJ_ADD);
}
 
return error;
@@ -207,7 +205,8 @@ int kobject_register(struct kobject * ko
printk("kobject_register failed for %s (%d)\n",
   kobject_name(kobj),error);
dump_stack();
-   }
+   } else
+   kobject_hotplug(kobj, KOBJ_ADD);
} else
error = -EINVAL;
return error;
@@ -301,7 +300,6 @@ int kobject_rename(struct kobject * kobj
 
 void kobject_del(struct kobject * kobj)
 {
-   kobject_hotplug(kobj, KOBJ_REMOVE);
sysfs_remove_dir(kobj);
unlink(kobj);
 }
@@ -314,6 +312,7 @@ void kobject_del(struct kobject * kobj)
 void kobject_unregister(struct kobject * kobj)
 {
pr_debug("kobject %s: unregistering\n",kobject_name(kobj));
+   kobject_hotplug(kobj, KOBJ_REMOVE);
kobject_del(kobj);
kobject_put(kobj);
 }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] kobject/hotplug split

2005-03-17 Thread Kay Sievers
This splits the implicit generation of hotplug events from
kobject_add() and kobject_del(), to give the driver core
control over the time the event is created.

The kobject_register() and unregister functions still have the same
behavior and emit the events by themselves.

The class, block and device core is changed now to emit the hotplug
event _after_ the "dev" file, the "device" symlink and the default
attributes are created. This will save udev from spinning in a stat() loop
to wait for the files to appear, which is expensive if we have a lot of
concurrent events.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Paul Mackerras
Nguyen, Tom L writes:

> Is EEH a PCI-SIG specification? Is EEH specs available in public?

No and no (not yet anyway).

> It seems that a PCI-PCI bridge per slot is hardware implementation
> specific. The fact that the PCI-PCI Bridge can isolate the slot is
> hardware feature specific.

Well, it's a common feature across all current IBM PPC64 machines.

> PCI Express AER driver uses similar concept of determining whether the
> driver is AER-aware or not except that PCI Express AER is independent
> from firmware support.

Don't worry about the firmware; the driver won't have to interact with
firmware itself, that's the job of the ppc64-specific platform code.

> Where does the platform code reside and where does it log the error?

By platform code I meant the code under the arch directory that knows
the details of the I/O topology of the machine, how to access the PCI
host bridges, etc.  How and where it logs the error is a platform
policy; on IBM ppc64 machines we have an error log daemon for this
purpose, which can do things like log the error to a file or send it
to another machine.

> In PCI Express if the driver is not AER-aware the fatal error message is
> reported by its upstream switch, the AER driver obtains comprehensive
> error information from the upstream switch (like EEH platform code
> obtains error information from the firmware). Since the driver is not
> AER-aware, the fatal error is reported to user to make a policy decision
> since the PCI Express does not have a hot-plug event for the slot like
> EEH platform. 

If there is a permanent failure of an upstream link, then maybe
generating unplug events for the devices below it would be a useful
thing to do.

> So it looks like the hot-plug capability of the driver is being used in
> lieu of specific callbacks to freeze and thaw IO in the case of a
> non-aware driver.  If the driver does not support hot-plug then the
> error is just logged.  Do you leave the slot isolated or perform error
> recovery anyway?

The choice is really to leave the slot isolated or to panic the
system.  Leaving the slot isolated risks having the driver loop in an
interrupt routine or deliver bad data to userspace, so we currently
panic the system.

> On a fatal error the interface is down.  No matter what the driver

Which interface do you mean here?

> supports (AER aware, EEH aware, unaware) all IO is likely to fail.
> Resetting a bus in a point-to-point environment like PCI Express or EEH
> (as you describe) should have little adverse effect.  The risk is the
> bus reset will cause a card reset and the driver must understand to
> re-initialize the card.  A link reset in PCI Express will not cause a
> card reset.  We assume the driver will reset its card if necessary.

How will the driver reset its card?

> In PCI Express the AER driver obtains fatal error information from the
> upstream switch driver. We can use the same API with message =
> PCIERR_ERROR_RECOVER to notify the endpoint driver, which is maybe
> unaware of the fatal error reported by its upstream device. Mostly the
> driver will respond with PCIERR_RESULT_NEED_RESET.

Sounds fine.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why no bigphysarea in mainline?

2005-03-17 Thread Michael Ellerman
On Fri, 18 Mar 2005 01:35, Dave Hansen wrote:
> Doing mem= for drivers isn't just a hack, it's *WRONG*.  It's a ticking
> time bomb that magically happens to work on some systems.  It will not
> work consistently on a discontiguous memory system, or a memory hotplug
> system.

I couldn't agree more. Problem is I've been asked to change the way mem=X 
works on PPC64 so that this hack will work, which is a horrible thought.

> Could you give some examples of drivers which are in the kernel that
> could benefit from this patch?  We don't tend to put things like this
> in, unless they have actual users.  We don't tend to change code for
> out-of-tree users, either.

No I can't. I've been approached by several "vendors" asking about using mem=X 
hacks on PPC64, however I doubt any of them have code in-tree. I'll check 
though.

cheers


pgpoVUl47Rs9y.pgp
Description: PGP signature


Re: [PATCH 2/2] Thinkpad Suspend Powersave: Add D2 power saving code for Thinkpads with Radeon video chipsets

2005-03-17 Thread Theodore Ts'o
On Thu, Mar 17, 2005 at 10:19:04AM +1100, Benjamin Herrenschmidt wrote:
> You probably want to remove the bit that does
> 
>   OUTREG(TV_DAC_CNTL, INREG(TV_DAC_CNTL) | 0x0700);
> 
> Or you'll lose TV output :)

I'm not using TV output, and the original patch stated:

> > +   /* Power down TV DAC, that saves a significant amount of power,
> > +* we'll have something better once we actually have some TVOut
> > +* support
> > +*/

I suppose I should renable the TV DAC and see how much power it
actually consumes if I enable it.  It would seem to me that we should
have a way that we can power down whatever parts of the video chipset
that we're not using.  (For example if I don't have anything connected
to the VGA output, it would be good if we could power that down too...)

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ANNOUNCE][PATCH] drivers/scsi/megaraid/megaraid_{mm,mbox}

2005-03-17 Thread Ju, Seokmann
On Thursday, March 17, 2005 12:28 PM, James wrote:
> This is still rejecting:
> 
> patching file drivers/scsi/megaraid/megaraid_mm.c
> Hunk #2 FAILED at 43.
> Hunk #4 FAILED at 68.
> Hunk #5 FAILED at 1217.
> Hunk #6 FAILED at 1225.
> Hunk #7 FAILED at 1245.
> 5 out of 7 hunks FAILED -- saving rejects to file
> drivers/scsi/megaraid/megaraid_mm.c.rej
Thank you for correction again.
At this time, I've download the source from
BK:/linux.bkbits.net:8080/linux-2.6 and found that the source already
includes 'compat_ioctl' support. Here, I'm sending another patch that has
fix for this issue.
I've verified (also, recreated above error with previous patch) this by
applying the patch to the source from BK.
I'm learning from your great comments and I appreciate your time on this.
Please let me know for any comment.

Thank you.

Sign-off-by: Seokmann Ju <[EMAIL PROTECTED]>

---
diff -Naur BK/Documentation/scsi/ChangeLog.megaraid
new/Documentation/scsi/ChangeLog.megaraid
--- BK/Documentation/scsi/ChangeLog.megaraid2005-03-17
18:06:38.115075184 -0500
+++ new/Documentation/scsi/ChangeLog.megaraid   2005-03-17
09:14:03.247953384 -0500
@@ -1,3 +1,69 @@
+Release Date   : Mon Mar 07 12:27:22 EST 2005 - Seokmann Ju <[EMAIL PROTECTED]>
+Current Version: 2.20.4.6 (scsi module), 2.20.2.6 (cmm module)
+Older Version  : 2.20.4.5 (scsi module), 2.20.2.5 (cmm module)
+
+1. Added IOCTL backward compatibility.
+   Convert megaraid_mm driver to new compat_ioctl entry points.
+   I don't have easy access to hardware, so only compile tested.
+   - Signed-off-by:Andi Kleen <[EMAIL PROTECTED]>
+
+2. megaraid_mbox fix: wrong order of arguments in memset()
+   That, BTW, shows why cross-builds are useful-the only indication of
+   problem had been a new warning showing up in sparse output on alpha
+   build (number of exceeding 256 got truncated).
+   - Signed-off-by: Al Viro
+   <[EMAIL PROTECTED]>
+
+3. Convert pci_module_init to pci_register_driver
+   Convert from pci_module_init to pci_register_driver
+   (from:http://kerneljanitors.org/TODO)
+   - Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>
+
+4. Use the pre defined DMA mask constants from dma-mapping.h
+   Use the DMA_{64,32}BIT_MASK constants from dma-mapping.h when
calling
+   pci_set_dma_mask() or pci_set_consistend_dma_mask(). See
+   http://marc.theaimsgroup.com/?t=10800199301=1=2 for more
+   details.
+   Signed-off-by: Tobias Klauser <[EMAIL PROTECTED]>
+   Signed-off-by: Domen Puncer <[EMAIL PROTECTED]>
+
+5. Remove SSID checking for Dobson, Lindsay, and Verde based products.
+   Checking the SSVID/SSID for controllers which have Dobson, Lindsay,
+   and Verde is unnecessary because device ID has been assigned by LSI
+   and it is unique value. So, all controllers with these IOPs have to
be
+   supported by the driver regardless SSVID/SSID.
+
+6. Date Thu, 27 Jan 2005 04:31:09 +0100 
+   From Herbert Poetzl <> 
+   Subject RFC: assert_spin_locked() for 2.6 
+
+   Greetings!
+
+   overcautious programming will kill your kernel ;)
+   ever thought about checking a spin_lock or even
+   asserting that it must be held (maybe just for
+   spinlock debugging?) ...
+
+   there are several checks present in the kernel
+   where somebody does a variation on the following:
+
+ BUG_ON(!spin_is_locked(_lock));
+
+   so what's wrong about that? nothing, unless you
+   compile the code with CONFIG_DEBUG_SPINLOCK but 
+   without CONFIG_SMP ... in which case the BUG()
+   will kill your kernel ...
+
+   maybe it's not advised to make such assertions, 
+   but here is a solution which works for me ...
+   (compile tested for sh, x86_64 and x86, boot/run
+   tested for x86 only)
+
+   best,
+   Herbert
+
+   - Herbert Poetzl <[EMAIL PROTECTED]>, Thu, 27 Jan 2005
+
 Release Date   : Thu Feb 03 12:27:22 EST 2005 - Seokmann Ju <[EMAIL PROTECTED]>
 Current Version: 2.20.4.5 (scsi module), 2.20.2.5 (cmm module)
 Older Version  : 2.20.4.4 (scsi module), 2.20.2.4 (cmm module)
diff -Naur BK/drivers/scsi/megaraid/mega_common.h
new/drivers/scsi/megaraid/mega_common.h
--- BK/drivers/scsi/megaraid/mega_common.h  2005-03-17
20:01:55.774431112 -0500
+++ new/drivers/scsi/megaraid/mega_common.h 2005-03-17
07:16:21.209546408 -0500
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff -Naur BK/drivers/scsi/megaraid/megaraid_mbox.c
new/drivers/scsi/megaraid/megaraid_mbox.c
--- BK/drivers/scsi/megaraid/megaraid_mbox.c2005-03-17
20:01:55.782429896 -0500
+++ new/drivers/scsi/megaraid/megaraid_mbox.c   2005-03-17
09:03:41.275507568 -0500
@@ -10,7 +10,7 @@
  *2 of the License, or (at your option) any later version.
  *
  * FILE: megaraid_mbox.c
- * Version 

Re: Error messages with ACPI

2005-03-17 Thread Len Brown
On Sat, 2005-03-05 at 13:09, Mina Nozar wrote:

> kernel:  ACPI-1133: *** Error: Method execution failed
> [\_SB_.BAT0._BST]
> (Node dfe043c0), AE_AML_NO_RETURN_VALUE

Please try the latest mm tree and report if these go away.


thanks,
-Len


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Benjamin Herrenschmidt
On Thu, 2005-03-17 at 15:06 -0800, Christoph Lameter wrote:

> I want to sleep 30 seconds because the system load is unlikely to change
> frequently.

Ugh ? That sounds like a magic number coming right from your hat or from
your test scenario ...

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BKCVS broken ?

2005-03-17 Thread H. Peter Anvin
Followup to:  <[EMAIL PROTECTED]>
By author:[EMAIL PROTECTED] (Larry McVoy)
In newsgroup: linux.dev.kernel
>
> I'll check into it.  We've been having problems with connecting to 
> master.kernel.org, yup, here you go, anyone else seeing this?
> 
> From [EMAIL PROTECTED]  Thu Mar 17 05:06:57 2005
> Date: Thu, 17 Mar 2005 05:00:57 -0800
> From: [EMAIL PROTECTED] (Cron Daemon)
> To: [EMAIL PROTECTED]
> Subject: Cron <[EMAIL PROTECTED]> /bk-cvsexport/src/UPDATE
> 
> Read from remote host master.kernel.org: Connection timed out
> 

Please Cc: any reports of badness on kernel.org to
[EMAIL PROTECTED]; I would have seen this quicker that way.

Around the time the above happened the machine was pretty bogged down,
because we're preparing new hardware to replace the main server, and
were doing some very large copies.  It might have caused a timeout.

I notice a long login from you at approximately 14:00 PST; does that
mean this is no longer an issue?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: binfmt_elf padzero problems

2005-03-17 Thread Paul Mackerras
Andrew Morton writes:

> I guess if the bss has zero length then we can skip the zeroing of the end
> of the page at the end of bss, as long as we're dead sure that we didn't
> accidentally instantiate a single page on behalf of that zero-length bss.

There is another thing I noticed about the bss code, which is that it
doesn't give the bss the permissions from the PT_LOAD segment, rather
it just uses VM_DATA_DEFAULT_FLAGS.  That doesn't matter at the moment
but may matter in future for ppc32.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 12/12] scripts/mod/sumversion.c: replace strtok() with strsep()

2005-03-17 Thread Nicolas Kaiser
* Sam Ravnborg <[EMAIL PROTECTED]>:

> On Sat, Mar 05, 2005 at 04:35:45PM +0100, [EMAIL PROTECTED] wrote:
> > 
> > Replaces strtok() with strsep()
> 
> Why - does it increase portability?

 "strtok() is not thread and SMP safe and strsep() should be
used instead"

http://janitor.kernelnewbies.org/docs/driver-howto.html#3.3.1

Cheers,
n.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Benjamin Herrenschmidt
On Thu, 2005-03-17 at 10:53 -0800, Nguyen, Tom L wrote:

> To support the AER driver calling an upstream device to initiate a reset
> of the link we need a specific callback since the driver doing the reset
> is not the driver who got the error.  In the case of general PCI this
> could be useful if a PCI bus driver were available to support the
> callback for a bridge device.  This would also support specific error
> recovery calls to reset an endpoint adapter.  We need a call to request
> a driver to perform a reset on a link or device.  

That is quite implementation specific, it doesn't need to be part of the
API (the way the general error management is implemented in PCIE could
be completely done within the bus drivers I suppose). Again, I'm not
trying to define or force a given implementation. I'm trying to define
the driver-side API, that's all.

I have difficulties following all of your previous explanations, I must
admit. My point here is I'd like you to find out if the API can fit on
the driver side, and if not, what would need to be changed. For example,
we might want to distinguish between slot reset (full hard reset) and
link reset, that sort of thing (thus adding a new state for link reset
and a new return code for the others for requesting a link reset if
possible, platforms that don't do it, like IBM EEH PCI would just
fallback to full reset).

Again, the goal here is to have a way for drivers to be mostly bus
agnostic (that is not have to care if they are running on PCI, PCI-X,
PCIE, with or without IBM EEH mecanism, and whatever other mecanism
another vendor might provide) and still implement basic error recovery.

A driver _designed_ for a PCI-Express deviec that knows it's on PCI
Express can perfectly use additional APIs to gather more error details,
etc... but it would be nice to fit the "common needs" as much as
possible in a common and _SIMPLE_ API. The simplicity here is a
requirement, I'm very serious about it, because if it's not simple,
drivers either won't implement it or won't get it right.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8 + free_hot_zeroed_page + free_cold_zeroed page

2005-03-17 Thread Christoph Lameter
On Thu, 17 Mar 2005, Jason Uhlenkott wrote:

> On Thu, Mar 17, 2005 at 05:36:50PM -0800, Christoph Lameter wrote:
> > +while (avenrun[0] >= ((unsigned long)sysctl_scrub_load << FSHIFT)) 
> > {
> > +   set_current_state(TASK_UNINTERRUPTIBLE);
> > +   schedule_timeout(30*HZ);
> > +   }
>
> This should probably be TASK_INTERRUPTIBLE.  It'll never actually get
> interrupted either way since kernel threads block all signals, but
> sleeping uninterruptibly contributes to the load average.

Correct.  I just do not seem to be able to get this right.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Nguyen, Tom L
On Wednesday, March 16, 2005 7:20 PM Benjamin Herrenschmidt wrote:
>> What mechanism (message??) is used to perform the bus and/or link
>> level reset?  For PCI Express the reset is performed by the upstream
>> port driver.  My API takes this into account.  Are you assuming the
PCI
>> device on the bus does the reset or will there be a PCI bus driver
that
>> will do the reset?  How will the PCI error handling code initiate a
>> reset?
>
>The "caller", that is the error management framework. I'm defining the
>API at the driver level, not the implementation at the core level.
>
>For example, on IBM pSeries with PCI-Express, we will probably not have
>an AER driver. This will be all dealt by the firmware which will mimmic
>that to the existing EEH error management. We'll have the same API to
do
>the reset that we have today for resetting a slot.

We decided to implement PCI Express error handling based on the PCI
Express specification in a platform independent manner.  This allows any
platform that implements PCI Express AER per the PCI SIG specification
can take advantage of the advanced features, much like SHPC hot-plug or
PCI Express hot-plug implementations.

>You may have noticed in general that I didn't either define who is
>callign those callbacks. It's all implicit that this is done by
platform
>error management code. For example, on ppc64, even the recovery step
>requires action from the platform since the slot has been physically
>isolated. After we have notified all drivers  with the "error detected"
>callback, if we decide we can try the "recover" step (all drivers
>returned they could try it and we decided the error wasn't too fatal)
we
>will call the firmware to re-enable IOs on the slot and call the
>"recover" step.

For PCI Express the endpoint device driver can take recovery action on
its own, depending on the nature of the error so long as it does not
affect the upstream device.  This can include endpoint device resets.
We expect the driver to do this upon error notification, if possible.
In PCI Express since the driver will have the most knowledge regarding
the error it will have the best ability to do device dependent recovery
and IO retry.  If its recovery fails then the AER driver will ask the
upstream device driver to perform the link reset.  Since this is more of
a side effect an explicit call to recover is not necessary.  However, we
understand and agree that it is needed to support the general error
recovery cases for PCI.

To support the AER driver calling an upstream device to initiate a reset
of the link we need a specific callback since the driver doing the reset
is not the driver who got the error.  In the case of general PCI this
could be useful if a PCI bus driver were available to support the
callback for a bridge device.  This would also support specific error
recovery calls to reset an endpoint adapter.  We need a call to request
a driver to perform a reset on a link or device.  

Thanks,
Long
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8 + free_hot_zeroed_page + free_cold_zeroed page

2005-03-17 Thread Jason Uhlenkott
On Thu, Mar 17, 2005 at 05:36:50PM -0800, Christoph Lameter wrote:
> +while (avenrun[0] >= ((unsigned long)sysctl_scrub_load << FSHIFT)) {
> + set_current_state(TASK_UNINTERRUPTIBLE);
> + schedule_timeout(30*HZ);
> + }

This should probably be TASK_INTERRUPTIBLE.  It'll never actually get
interrupted either way since kernel threads block all signals, but
sleeping uninterruptibly contributes to the load average.  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oom with 2.6.11

2005-03-17 Thread Christian Kujau
Coywolf Qi Hunt wrote:
> I do "grep check-route.sh oom_2.6.11.3.txt | wc" and it shows 4365

duh, good catch! really!

> lines, which means there're 4365 that script processes running, from 
> pid 4260 to12747, mostly with pretty low points, 123.
> Based on this points, suppose each script consumes 100k, that'll be
> 100k*4k=400M roughly. And your box's is merely 256M MemTotal.

yes, i just checked, the script is looping and crond is starting a new
one, and anotherand the oom-killer does not catch it, because it's too
small and of course don't know where it is coming from (crond).

> Check this script and disable it; see what will happen.

yes, will do that. on a (not so unimportant) side-note: i was told the
whole thing should be fixed with 2.6.11.4:

  [PATCH] CAN-2005-0384: Remote Linux DoS on ppp servers


after all it seems to be PEBKAC and bad luck...what a week.

thank you for your help,
Christian.
-- 
BOFH excuse #416:

We're out of slots on the server
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.11-mm3 - redzone mismatch

2005-03-17 Thread Andrew James Wade
My computer crashed twice today. Both times I was unable to use the keyboard,
but was able to shutdown X.

I have had hardware problems related to overheating, but I believe that I have
resolved my overheating problems, and in any event I had not been stressing
the cpu since the first crash.

The following messages appeared in my kernel log before my first crash:

Mar 17 12:51:33 kenichi kernel: slab dentry_cache: redzone mismatch in slabp 
c2802000, objp c2802ab8, bufctl 0xfffe
Mar 17 12:51:33 kenichi kernel: Redzone: 0x170fc2a5/0x120fc2a5.
Mar 17 12:51:33 kenichi kernel: Last user: [d_alloc+28/464](d_alloc+0x1c/0x1d0)
Mar 17 12:51:33 kenichi kernel: 000: 00 00 00 00 00 00 00 00 a4 7a 32 cb 34 2e 
80 c2
Mar 17 12:51:33 kenichi kernel: 010: 1d 47 cc d2 0f 00 00 00 20 2b 80 c2 6c 2b 
80 c2
Mar 17 12:51:33 kenichi kernel: slab dentry_cache: redzone mismatch in slabp 
c2802000, objp c2802b4c, bufctl 0xfffe
Mar 17 12:51:33 kenichi kernel: Redzone: 0x90fc2a5/0x170fc2a5.
Mar 17 12:51:33 kenichi kernel: Last user: [d_alloc+28/464](d_alloc+0x1c/0x1d0)
Mar 17 12:51:33 kenichi kernel: 000: 00 00 00 00 00 00 00 35 8c 7c 32 cb 34 2e 
80 12
Mar 17 12:51:33 kenichi kernel: 010: 9a d3 fd f5 15 00 00 09 b4 2b 80 c2 00 2c 
80 35
...
repeat every five minutes for a few hours
...
then:

Mar 17 17:56:35 kenichi kernel:  ff ff ff ff ff bf ff
Mar 17 17:56:35 kenichi kernel: 12ff0: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13000: 46 41 43 53 40 00 00 00 00 00 00 00 00 
00 00 00
Mar 17 17:56:35 kenichi kernel: 13010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
Mar 17 17:56:35 kenichi kernel: 13020: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
Mar 17 17:56:35 kenichi kernel: 13030: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
Mar 17 17:56:35 kenichi kernel: 13040: ff fb ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13050: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13060: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13070: ff ff ff ff ff ff f7 ff ff ff df ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13080: ff ff ff ff ff ff ff ff ff ff f7 ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13090: ff ff ff df ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 130a0: ff ff ff ff ff ff ff ff ff ff ff ef ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 130b0: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 130c0: bf ff ff ff ff ff ff ff ff ff ff ff ff 
7f ff ff
Mar 17 17:56:35 kenichi kernel: 130d0: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 130e0: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 130f0: ff ff ff ff ff bf ff ff ff f7 ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13100: ff ff ff ff ff ff bf ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13110: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13120: ff ff ff ff ff ff ff ff ff ff ff ef ff 
ff db ff
Mar 17 17:56:35 kenichi kernel: 13130: ff ff ff ff ff ff ff ff fb ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13140: ff ff ff ff ff ff ff ff db ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13150: ff ff ff ff ff ff ff ff f3 ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13160: ff ff 7f ff ff ff ff fb ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13170: ff ff ff ef ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13180: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13190: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 131a0: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff fb ff
Mar 17 17:56:35 kenichi kernel: 131b0: ff ff ff ff ff ff ff ff ff ff ff ff ef 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 131c0: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 131d0: ff ff fd ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 131e0: ff ff ff ff ff ff ff ff ef ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 131f0: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ad ff ff
Mar 17 17:56:35 kenichi kernel: 13200: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13210: ff ff ff ff ff ff ff ff ff bf df ff ff 
ff ef ff
Mar 17 17:56:35 kenichi kernel: 13220: ff ff ff ff ff ff ff ff ff ff ef ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13230: ff 7f ff ff ff ff ff ff ff ff ff ff ff 
ff ff 7f
Mar 17 17:56:35 kenichi kernel: 13240: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff df
Mar 17 17:56:35 kenichi kernel: 13250: ff f7 ff ff ff ff fe ff ff ff ff ff ff 
ff ff fe
Mar 17 17:56:35 kenichi kernel: 13260: ff ff ff ff ff ff ff ff ff bf ff ff ff 
fe df ff
Mar 17 17:56:35 kenichi kernel: 13270: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
Mar 17 17:56:35 kenichi kernel: 13280: ef ff ff 

Re: [PATCH] Prezeroing V8 + free_hot_zeroed_page + free_cold_zeroed page

2005-03-17 Thread Christoph Lameter
Here is the fixed up zeroing patch with management of hot/cold zeroed
pages.

If quicklists would like the use this then they need to use

free_hot_zeroed_page(page)

and

get_zeroed_page(GFP)

for their management of hot zeroed pages. If the pool is empty then it
will be replenished either from the pool build up by kscrubd or by zeroing
a couple of pages on the fly.

The most expensive operation in the page fault handler is (apart of SMP
locking overhead) the touching of all cache lines of a page by
zeroing the page. This zeroing means that all cachelines of the faulted
page (on Altix that means all 128 cachelines of 128 byte each) must be
handled and later written back. This patch allows to avoid having to
use all cachelines if only a part of the cachelines of that page is
needed immediately after the fault. Doing so will only be effective for
sparsely accessed memory which is typical for anonymous memory and pte
maps.

The patch can make prezeroing more effective by also allowing the use
of hardware devices to offload zeroing from the cpu. This avoids
the invalidation of the cpu caches by extensive zeroing operations.
For that purpose a driver may register a zeroing driver via

register_zero_driver(z)

When the number of zeroed pages falls below a lower threshhold (defined
by setting /proc/sys/vm/scrub_start) kscrubd is invoked (similar
to the swapper). kscrubd then zeroes free pages until the upper
threshold is reached (set by /proc/sys/vm/scrub_stop). The zeroing
is performed on a percentage of pages at each order of freed pages to
minimize fragmentation of pages.

kscrubd performs short bursts of zeroing when needed and tries to stay
off the processor as much as possible. Kscrubd will only run when the load
is less than set in /proc/sys/vm/scrub_load (defaults to 1).

The patch also provides the management of hot and cold lists for
zeroed pages in the pageset structure.

Patch against 2.6.11.3-bk3. Performance data may be found at
http://oss.sgi.com/projects/page_fault_performance/

Changelog:
- Cleanup and document more clearly
- Add full support for hot/cold zeroed pages.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.11/mm/page_alloc.c
===
--- linux-2.6.11.orig/mm/page_alloc.c   2005-03-17 16:38:55.0 -0800
+++ linux-2.6.11/mm/page_alloc.c2005-03-17 17:28:27.0 -0800
@@ -12,6 +12,8 @@
  *  Zone balancing, Kanoj Sarcar, SGI, Jan 2000
  *  Per cpu hot/cold page lists, bulk allocation, Martin J. Bligh, Sept 2002
  *  (lots of bits borrowed from Ingo Molnar & Andrew Morton)
+ *  Page zeroing by Christoph Lameter, SGI, Dec 2004 using
+ * initial code for __GFP_ZERO support by Andrea Arcangeli, Oct 2004.
  */

 #include 
@@ -34,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include "internal.h"
@@ -180,16 +183,20 @@ static void destroy_compound_page(struct
  * zone->lock is already acquired when we use these.
  * So, we don't need atomic page->flags operations here.
  */
-static inline unsigned long page_order(struct page *page) {
+static inline unsigned long page_zorder(struct page *page) {
return page->private;
 }

-static inline void set_page_order(struct page *page, int order) {
-   page->private = order;
+/* We use bit PAGE_PRIVATE_ZERO_SHIFT in page->private to encode
+ * the zeroing status. This makes buddy pages with different zeroing
+ * status not match to avoid merging zeroed with unzeroed pages
+ */
+static inline void set_page_zorder(struct page *page, int order, int zero) {
+   page->private = order + (zero << PAGE_PRIVATE_ZERO_SHIFT);
__SetPagePrivate(page);
 }

-static inline void rmv_page_order(struct page *page)
+static inline void rmv_page_zorder(struct page *page)
 {
__ClearPagePrivate(page);
page->private = 0;
@@ -231,14 +238,15 @@ __find_combined_index(unsigned long page
  * we can do coalesce a page and its buddy if
  * (a) the buddy is free &&
  * (b) the buddy is on the buddy system &&
- * (c) a page and its buddy have the same order.
+ * (c) a page and its buddy have the same order and the same
+ * zeroing status.
  * for recording page's order, we use page->private and PG_private.
  *
  */
-static inline int page_is_buddy(struct page *page, int order)
+static inline int page_is_buddy(struct page *page, int order, int zero)
 {
if (PagePrivate(page)   &&
-   (page_order(page) == order) &&
+   (page_zorder(page) == order + (zero << PAGE_PRIVATE_ZERO_SHIFT)) &&
!PageReserved(page) &&
 page_count(page) == 0)
return 1;
@@ -270,7 +278,7 @@ static inline int page_is_buddy(struct p
  */

 static inline void __free_pages_bulk (struct page *page,
-   struct zone *zone, unsigned int order)
+   struct zone *zone, unsigned int order, unsigned int zero)
 {
unsigned long page_idx;
int 

Re: [2.6 patch] USB: possible cleanups

2005-03-17 Thread Greg KH
On Tue, Mar 01, 2005 at 01:43:52AM +0100, Adrian Bunk wrote:
> Before I'm getting flamed to death:
> This patch contains possible cleanups. If parts of this patch conflict
> with pending changes these parts of my patch have to be dropped.
> 
> This patch contains the following possible cleanups:
> - make needlessly global code static
> - #if 0 the following unused global functions:
>   - core/usb.c: usb_buffer_map
>   - core/usb.c: usb_buffer_unmap
> - remove the following unneeded EXPORT_SYMBOL's:
>   - core/hcd.c: usb_bus_init
>   - core/hcd.c: usb_alloc_bus
>   - core/hcd.c: usb_register_bus
>   - core/hcd.c: usb_deregister_bus
>   - core/hcd.c: usb_hcd_irq
>   - core/usb.c: usb_buffer_map
>   - core/usb.c: usb_buffer_unmap
>   - core/buffer.c: hcd_buffer_create
>   - core/buffer.c: hcd_buffer_destroy
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Looks good to me, thanks for the patch.  Applied.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] Re: Fw: Anybody? 2.6.11 (stable and -rc) ACPI breaks USB

2005-03-17 Thread Li Shaohua
On Fri, 2005-03-18 at 02:08, Bjorn Helgaas wrote:
> On Thu, 2005-03-17 at 09:33 +0800, Li Shaohua wrote:
> > The comments in previous quirk said it's required only in PIC mode.
> ...
> > I feel we concerned too much. Changing the interrupt line isn't harmful,
> > right? Linux actually ignored interrupt line. Maybe just a
> > PCI_FIXUP_ENABLE(PCI_VENDOR_ID_VIA, PCI_ANY_ID, quirk_via_irq) is
> > sufficient.
> 
> I think it's good to limit the scope of the quirk as much as
> possible because that makes it easier to do future restructuring,
> such as device-specific interrupt routers.
> 
> The comment (before quirk_via_acpi(), nowhere near quirk_via_irqpic())
> says *on-chip devices* have this unusual behavior when the interrupt
> line is written.  That makes sense to me.
> 
> Writing the interrupt line on random plug-in Via PCI devices does
> not make sense to me, because for that to have any effect, an
> upstream bridge would have to be snooping the traffic going through
> it.  That doesn't sound plausible to me.
> 
> What about this:
Hmm, this looks like previous solution. We removed the specific via
quirk is because we don't know how many devices have such issue. Every
time we encounter an IRQ issue in a VIA PCI device, we will suspect it
requires quirk and keep try. This is a big overhead. 

Thanks,
Shaohua

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] drivers/usb/net/pegasus.c: make some code static

2005-03-17 Thread Greg KH
On Tue, Mar 01, 2005 at 01:35:41AM +0100, Adrian Bunk wrote:
> This patch makes some needlessly global code static.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Applied, thanks.

greg k-h

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] remove drivers/usb/image/hpusbscsi.c

2005-03-17 Thread Greg KH
On Thu, Mar 03, 2005 at 02:38:56PM +0100, Adrian Bunk wrote:
> USB_HPUSBSCSI was marked as BROKEN in 2.6.11 since libsane is the 
> preferred way to access these devices.
> 
> Unless someone plans to resurrect this driver, I'm therefore proposing 
> this patch to completely remove it.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Applied, thanks.

greg k-h

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] drivers/usb/storage/: cleanups

2005-03-17 Thread Greg KH
On Tue, Mar 01, 2005 at 01:37:58AM +0100, Adrian Bunk wrote:
> This patch contains the following cleanups:
> - make needlessly global code static
> - scsiglue.c: remove the unused usb_stor_sense_notready
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Applied, thanks.

greg k-h

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] drivers/usb/serial/: make some functions static

2005-03-17 Thread Greg KH
On Tue, Mar 01, 2005 at 01:39:35AM +0100, Adrian Bunk wrote:
> This patch makes some needlessly global functions static.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Applied, thanks.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


AIO panic on 2.6.11 on PPC64 caused by is_hugepage_only_range()

2005-03-17 Thread Daniel McNeil
When testing AIO on PPC64 (a power5 machine) running 2.6.11 with 
CONFIG_HUGETLB_PAGE=y,  I ran into a kernel panic when a process exits that has
done AIO (io_queue_init()) but has not done the io_queue_release().  The
exit_aio() code is cleaning up and panicing when trying to free the aio ring 
buffer.

I tracked this down to is_hugepage_only_range() (include/asm-ppc64/page.h)
which is doing a touches_hugepage_low_range() which is checking
current->mm->context.htlb_segs.  The problem is that exit_mm() 
cleared tsk->mm before doing the mmput() which leads to the exit_aio()
and then the panic.  Looks like is_hugepage_only_range() is only used
in ia64 and ppc64.  Possible fix is to change is_hugepage_only_range()
to take an 'mm' as a parameter as well as 'addr' and 'len' and then
the ppc64 code could change to use 'mm'.  It looks like it has been
broken for quite a while.

Here's the stack trace:

cpu 0x2: Vector: 300 (Data Access) at [c001d1be7590]
pc: c0092960: .unmap_region+0x17c/0x4a4
lr: c0092bb0: .unmap_region+0x3cc/0x4a4
sp: c001d1be7810
   msr: 80009032
   dar: 298
 dsisr: 4000
  current = 0xc1dd77b0
  paca= 0xc0595c00
pid   = 11336, comm = aiodio_readoff
[c001d1be78e0] c0093d08 .do_munmap+0x240/0x408
[c001d1be79b0] c00d11b4 .aio_free_ring+0x10c/0x1d8
[c001d1be7a50] c00d162c .__put_ioctx+0x84/0x120
[c001d1be7af0] c00d3640 .exit_aio+0xf4/0x100
[c001d1be7b80] c004dfd4 .mmput+0x80/0x15c
[c001d1be7c20] c0053648 .exit_mm+0x1b4/0x264
[c001d1be7cc0] c00555ac .do_exit+0x10c/0xdb0
[c001d1be7d90] c00562a8 .do_group_exit+0x58/0xd8
[c001d1be7e30] c000d500 syscall_exit+0x0/0x18

Here's a program that produces the panic:
(compile using cc -o aiodio_read aiodio_read.c -laio).
--
#define _XOPEN_SOURCE 600
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 


int pagesize;
char *iobuf;
io_context_t myctx;
int aio_maxio = 4;

/*
 * do a AIO DIO write
 */
int do_aio_direct_read(int fd, char *iobuf, int offset, int size)
{
struct iocb myiocb;
struct iocb *iocbp = 
int ret;
struct io_event e;
struct stat s;

io_prep_pread(, fd, iobuf, size, offset);
if ((ret = io_submit(myctx, 1, )) != 1) {
perror("io_submit");
return ret;
}

ret = io_getevents(myctx, 1, 1, , 0);

if (ret) {
struct iocb *iocb = e.obj;
int iosize = iocb->u.c.nbytes;
char *buf = iocb->u.c.buf;
long long loffset = iocb->u.c.offset;

printf("AIO read of %d at offset %lld returned %d\n",
iosize, loffset, e.res);
}

return ret;
}

int main(int argc, char *argv[])
{
char *filename;
int fd;
int err;

filename = "test.aio.file";
fd = open(filename, O_RDWR|O_DIRECT|O_CREAT|O_TRUNC, 0666);

pagesize = getpagesize();
err = posix_memalign((void**) , pagesize, pagesize);
if (err) {
fprintf(stderr, "Error allocating %d aligned bytes.\n",
pagesize);
exit(1);
}
err = write(fd, iobuf, pagesize);
if (err != pagesize) {
fprintf(stderr, "Error ret = %d writing %d bytes.\n",
err, pagesize);
perror("");
exit(1);
}
memset(, 0, sizeof(myctx));
io_queue_init(aio_maxio, );
err = do_aio_direct_read(fd, iobuf, 0, pagesize);
close(fd);

printf("This will panic on ppc64\n");
return err;
}
--


Daniel


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


EIP and VMA

2005-03-17 Thread Luca Falavigna
Hi,
I am working on this piece of code (simplified):

void ip_vma(struct task_struct *task, struct pt_regs *regs)
{
struct mm_struct *mm;
struct vm_area_struct *vma;

if(task) {
mm = get_task_mm(task);
if(mm) {
vma = find_vma(mm, regs->eip);
if(vma) {
/* Some code */
}
else
printk("WARNING: No VMA\n");
mmput(mm);
}
}
}

I would like to get instruction pointer's VMA of a task. In order to do so, I
use find_vma function, using regs->eip as instruction pointer value.
Unfortunately I always get "WARNING: No VMA" message because find_vma isn't able
to find the right VMA regs->eip address belongs to.
Is regs->eip the right place where istruction pointer is located or I should
find that value elsewhere?

Thank you,



Luca
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-17 Thread Evgeniy Polyakov
On Thu, 17 Mar 2005 08:56:57 -0800
Jesse Barnes <[EMAIL PROTECTED]> wrote:

> On Thursday, March 17, 2005 1:04 am, Guillaume Thouvenin wrote:
> > +static inline void fork_connector(pid_t parent, pid_t child)
> > +{
> > + static DEFINE_SPINLOCK(cn_fork_lock);
> > + static __u32 seq;   /* used to test if message is lost */
> > +
> > + if (cn_fork_enable) {
> > +  struct cn_msg *msg;
> > +
> > +  __u8 buffer[CN_FORK_MSG_SIZE];
> > +
> > +  msg = (struct cn_msg *)buffer;
> > +
> > +  memcpy(>id, _fork_id, sizeof(msg->id));
> > +  spin_lock(_fork_lock);
> > +  msg->seq = seq++;
> > +  spin_unlock(_fork_lock);
> 
> As I mentioned before, this won't work very well on a large CPU count system. 
>  
> cn_fork_lock will be taken by each CPU everytime it does a fork, meaning that 
> forks will be very slow if lots of CPUs are doing them at the same time.  Is 

Maybe... But..., concider ppc system, 
each lock is about 10 instructions(or even less), 
increment with return is about 3-5 instructions, unlock - 
barrier() and setting.
The whole fork syscall contains too bigger number
of instruction(do_fork() itself is more than 500, 
and it is not counting number of instructions in 
functions that are called from do_fork()) 
to care about 20 idle on each CPU, 
even if there are 512 of them.

The most significant part there - is requirement to store
u32 seq in each CPU's cache and thus flush cacheline + 
invalidate/get from mem on each other cpus
each time it is accessed, which is a big price.

> there a more scalable way to ensure message delivery?

It is totally Guillaume's work - so he decides, 
I would recomend per cpu counters and processor's 
id in each message.
And of course userspace should take care of misordered
messages.
I personally prefer such mechanism.

Guillaume?
 
> Jesse


Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Christoph Lameter
On Thu, 17 Mar 2005, Andrew Morton wrote:

> > I switched off the page-zeroing hardware for the tests.
>
> What tests?

For the results on the darn URL.

> See, a speedup in a simple malloc+memset could be due to either a simple
> transfer of load from user to kscrubd, or it could be due to leveraging the
> page-zeroing hardware.
>
> The latter, I expect, if the workload is actually touching every byte of
> all the pages.  Is it?

If the workload is touching every byte of the workload immediately after a
page fault then prezeroing is not effective. Its only useful for sparse
accesses (like page tables etc).

> If we're doing kscrubd zeroing via memset() then the total system load
> would actually be increased if the application is touching every byte, yes?

The kernel would have zeroed a page uselessly at an idle time.

> > Without zeroing hardware the eroing actions are moved to idle
> > system time (load < /proc/sys/vm/scrub_load). Its shifting the cpu load.
>
> Right.  We'd expect that to be a net regression if the application is
> touching all of the memory and a net win if it is touching the memory
> sparsely, yes?

There will be no regression (as shown on the unnamed URL) if the scrubd
is only run during idle times (and also there will be no regression if
the known zeroed pages are returned to the hotlists and then
used).

Kscrubd is an experimental configuration option. Switch it off[default]
and the zero hotlists are only populated by the return of known zeroed
pages via free_hot_zeroed_page etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


EIP and VMA

2005-03-17 Thread Luca Falavigna
Hi,
I am working on this piece of code (simplified):

void ip_vma(struct task_struct *task, struct pt_regs *regs)
{
struct mm_struct *mm;
struct vm_area_struct *vma;

if(task) {
mm = get_task_mm(task);
if(mm) {
vma = find_vma(mm, regs->eip);
if(vma) {
/* Some code */
}
else
printk("WARNING: No VMA\n");
mmput(mm);
}
}
}

I would like to get instruction pointer's VMA of a task. In order to do so, I
use find_vma function, using regs->eip as instruction pointer value.
Unfortunately I always get "WARNING: No VMA" message because find_vma isn't able
to find the right VMA regs->eip address belongs to.
Is regs->eip the right place where istruction pointer is located or I should
find that value elsewhere?

Thank you,



Luca

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Real-Time Preemption and RCU

2005-03-17 Thread Paul E. McKenney
Hello!

As promised/threatened earlier in another forum...

Thanx, Paul



The Real-Time Preemption patch modified RCU to permit its read-side
critical sections to be safely preempted.  This has worked, but there
are a few serious problems with this variant of RCU.  [If you just want
to skip directly to the code, search for "synchronize_kernel(void)".
There are five occurrences, each a variation on the theme.  I recommend
the fifth one.  The third one might also be OK in some environments.
If you have better approaches, please do not keep them a secret!!!]

So, why am I saying that there are problems with the real-time preemption
implementation of RCU?

o   RCU read-side critical sections cannot be freely nested,
since the read-side critical section now acquires locks.
This means that the real-time preemption variant of RCU
is subject to deadlock conditions that "classic" RCU
is immune to.  This is not just a theoretical concern,
for example, see nf_hook_slow() in linux/net/core/netfilter.c:

+   /*
+* PREEMPT_RT semantics: different-type read-locks
+* dont nest that easily:
+*/
+// rcu_read_lock_read(_lock);

A number of other RCU read-side critical sections have been
similarly disabled (17 total in the patch).  Perhaps most embedded
systems will not be using netfilter heavily, but this does bring
up a very real stability concern, even on UP systems (since
preemption will eventually occur when and where least expected).

o   RCU read-side critical sections cannot be unconditionally
upgraded to write-side critical sections in all cases.
For example, in classic RCU, it is perfectly legal to do
the following:

rcu_read_lock();
list_for_each_entry_rcu(lep, head, p) {
if (p->needs_update) {
spin_lock(_lock);
update_it(p);
spin_unlock(_lock);
}
}
rcu_read_unlock()

This would need to change to the following for real-time
preempt kernels:

rcu_read_lock_spin(_lock);
list_for_each_entry_rcu(lep, head, p) {
if (p->needs_update) {
spin_lock(_lock);
update_it(p);
spin_unlock(_lock);
}
}
rcu_read_unlock_spin(_lock)

This results in self-deadlock.

o   There is an API expansion, with five different variants
of rcu_read_lock():

API # uses
--
rcu_read_lock_spin()11
rcu_read_unlock_spin()  12
rcu_read_lock_read()42
rcu_read_unlock_read()  42
rcu_read_lock_bh_read()  2
rcu_read_unlock_bh_read()3
rcu_read_lock_down_read()   14
rcu_read_unlock_up_read()   20
rcu_read_lock_nort() 3
rcu_read_unlock_nort()   4

TOTAL  153

o   The need to modify lots of RCU code expands the size of this
patch -- roughly 10% of the 20K lines of this patch are devoted
to modifying RCU code to meet this new API.  10% may not sound
like much, but it comes to more than 2,000 lines of context diffs.

Seems to me that it would be good to have an RCU implementation
that meet the requirements of the Real-Time Preemption patch,
but that is 100% compatible with the "classic RCU" API.  Such
an implementation must meet a number of requirements, which are
listed at the end of this message (search for "REQUIREMENTS").

I have looked into a number of seductive but subtly broken "solutions"
to this problem.  The solution listed here is not perfect, but I believe
that it has enough advantages to be worth pursuing.  The solution is
quite simple, and I feel a bit embarrassed that it took me so long
to come up with it.  All I can say in my defense is that the idea
of -adding- locks to improve scalability and eliminate deadlocks is
quite counterintuitive to me.  And, like I said earlier, if you
know of a better approach, please don't keep it a secret!

The following verbiage steps through several variations on this
solution, as follows:

1.  "Toy" implementation that has numerous API, scalability,
and realtime problems, but is a very simple 28-line
illustration of the underlying principles.  (In case you
get excited about this being much smaller than 

Re: [PATCH] add TIMEOUT to firmware_class hotplug event

2005-03-17 Thread Greg KH
On Thu, Mar 17, 2005 at 03:34:31AM +0100, Kay Sievers wrote:
> On Tue, 2005-03-15 at 09:25 +0100, Hannes Reinecke wrote:
> > The current implementation of the firmware class breaks a fundamental
> > assumption in udevd: that the physical device can be initialised fully
> > prior to executing the next event for that device.
> 
> Here we add a TIMEOUT value to the hotplug environment of the firmware
> requesting event. I will adapt udevd not to wait for anything else, if
> it finds a TIMEOUT key.
> 
> Signed-off-by: Kay Sievers <[EMAIL PROTECTED]>

Applied, thanks.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> On Thu, 17 Mar 2005, Andrew Morton wrote:
> 
> > >  http://oss.sgi.com/projects/page_fault_performance/
> >
> > Oh no, not that page again ;)
> 
> Yes indeed!
> 
> > Seems to say that prezeroing makes negligible difference to kernel builds,
> > but speeds up a big malloc+memset by 3x to 4x, yes?
> 
> Correct.
> 
> > Are there any real-worldish workloads which show an appreciable benefit?
> 
> Ummm. Big loads are our real-worldish workloads here.

Sure, but not malloc+memset+exit.

How much improvement do these big numerical tasks get from the patch?

> > The large speedup for a big memset seems odd - I assume it's simply
> > transferring CPU load from the user's process over to kscrubd.  Or is it
> > the fancy page-zeroing hardware?  How do we differentiate the two?
> 
> I switched off the page-zeroing hardware for the tests.

What tests?

See, a speedup in a simple malloc+memset could be due to either a simple
transfer of load from user to kscrubd, or it could be due to leveraging the
page-zeroing hardware.

The latter, I expect, if the workload is actually touching every byte of
all the pages.  Is it?

If we're doing kscrubd zeroing via memset() then the total system load
would actually be increased if the application is touching every byte, yes?

> > Are there any workloads which are seeing a benefit on a CPU which doesn't
> > have the zeroing hardware?
> 
> Without zeroing hardware the eroing actions are moved to idle
> system time (load < /proc/sys/vm/scrub_load). Its shifting the cpu load.

Right.  We'd expect that to be a net regression if the application is
touching all of the memory and a net win if it is touching the memory
sparsely, yes?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Xen/i386 cleanups - AGP bus/phys cleanups

2005-03-17 Thread Paul Mackerras
Alan Cox writes:

> On Iau, 2005-03-17 at 09:34, Paul Mackerras wrote:
> > This code needs real physical addresses, which are not the same things
> > as bus addresses.  
> 
> Not always. The code needs platform specific goodies. We've only never
> been burned so far because there isn't a box with an IOMMU and AGPGART
> where one maps through the other.

That sounds like a good way to make AGP accesses slower. :)

Seriously, given that AGP is a technology that is being superseded by
PCI Express, I think it's reasonable to look at the range of current
implementations to see what we have to cope with.  So I don't think
it's worth worrying too much about the possibility of GARTs that go
through the IOMMU.  However, the idea of having phys_to_agp/agp_to_phys
(or virt_to_agp/agp_to_virt) sounds like it wouldn't be too much
effort, if it would help Xen.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch][resend] convert a remaining verify_area to access_ok (was: Re: [PATCH 2.6.11-mm1] mips: more convert verify_area to access_ok) (fwd)

2005-03-17 Thread Jesper Juhl
On Thu, 17 Mar 2005, Ralf Baechle wrote:

> On Wed, Mar 16, 2005 at 10:35:09PM +0100, Jesper Juhl wrote:
> 
> > Around 2.6.11-mm1 Yoichi Yuasa found a user of verify_area that I had 
> > missed when converting everything to access_ok. The patch below still 
> > applies cleanly to 2.6.11-mm4.
> > Please apply (unless of course you already picked it up back then and 
> > have it in a queue somewhere :) .
> 
> Oh gosh, you actually converted the whole IRIX compatibility mess even,
> amazing stomach you have :-) I only noticed that when I just looked at
> Linus' tree - after buring a few hours into cleaning those files myself -
> mine are now almost free of sparse warnings.
> 
I hope I did a descent job and that you didn't waste too much time 
duplicating effort...

> The last instance of verify_area() in the MIPS code is now the definition
> itself.
> 
The plan is to wait for a few months (or a few kernel releases - whichever 
comes first) and then I'll send Andrew patches to remove it completely.
There are still a few related nits left, like the FPU_verify_area function 
arch/i386/math-emu/reg_ld_str.c and the rw_verify_area function in 
fs/read_write.c that I want to get out of the way first (think I'll 
probably end up attempting to rename those s/verify_area/access_ok/ and 
see if people scream).


-- 
Jesper


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BKCVS broken ?

2005-03-17 Thread Larry McVoy
I got swamped, I'll look at this after dinner.  But you might take a look
at this: http://www.bitkeeper.com/press/2005-03-17.html which is a link
to a very simple open source BK client.  It doesn't do much except track
the head of the tree but it does that well.  It's slightly better than
that, it puts all the checkin comments in BK/ChangeLog so you don't have
to go over the wire to get those.

It's intended for someone who just wants the latest and greatest snapshot,
knows how to do cp -rp and diff -Nur, it's pretty basic.  It's not a
CVS gateway replacement but it does work for every tree on bkbits.net.
Just to be clear, we are not dropping the CVS gateway, this is "in
addition to" not "instead of".

If this turns out to be popular we can look at making a BitTorrent image
of each tree available so people can get at them without swamping us.

Don't worry about the license, it's a joke.  BSD license OK with everyone?
-- 
---
Larry McVoylm at bitmover.com   http://www.bitkeeper.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] IDE failure on ACPI resume

2005-03-17 Thread Nate Lawson
Matthew Garrett wrote:
On Thu, 2005-03-17 at 12:34 -0800, Nate Lawson wrote:
Very interesting.  I was hoping to someday have _GTF et al implemented 
but the ATA knowledge required was above my head.  I also strongly 
suspected that the info published by _GTF would likely be invalid.  Does 
Windows actually use that method or just hardcoded ATA initialization?
I believe that Windows does use the _GTF methods.
You are correct.  A quick scan of my w2k drivers shows atapi.sys uses 
the _GTF, _GTM, and _STM methods.

--
Nate
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Christoph Lameter
On Thu, 17 Mar 2005, Andrew Morton wrote:

> >  http://oss.sgi.com/projects/page_fault_performance/
>
> Oh no, not that page again ;)

Yes indeed!

> Seems to say that prezeroing makes negligible difference to kernel builds,
> but speeds up a big malloc+memset by 3x to 4x, yes?

Correct.

> Are there any real-worldish workloads which show an appreciable benefit?

Ummm. Big loads are our real-worldish workloads here.

> The large speedup for a big memset seems odd - I assume it's simply
> transferring CPU load from the user's process over to kscrubd.  Or is it
> the fancy page-zeroing hardware?  How do we differentiate the two?

I switched off the page-zeroing hardware for the tests.

> Are there any workloads which are seeing a benefit on a CPU which doesn't
> have the zeroing hardware?

Without zeroing hardware the eroing actions are moved to idle
system time (load < /proc/sys/vm/scrub_load). Its shifting the cpu load.

But I just fixed things up so that the kernel can return hot zeroed
pages to the pool for quicklist management. This yields zeroed pages
without kscrubd.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] IDE failure on ACPI resume

2005-03-17 Thread Matthew Garrett
On Thu, 2005-03-17 at 12:34 -0800, Nate Lawson wrote:

> Very interesting.  I was hoping to someday have _GTF et al implemented 
> but the ATA knowledge required was above my head.  I also strongly 
> suspected that the info published by _GTF would likely be invalid.  Does 
> Windows actually use that method or just hardcoded ATA initialization?

I believe that Windows does use the _GTF methods.
-- 
Matthew Garrett | [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Christoph Lameter
On Thu, 17 Mar 2005, Andrew Morton wrote:

> Christoph Lameter <[EMAIL PROTECTED]> wrote:
> >
> > > And given that we have separate buddy structures for zeroed and not-zeroed
> >  > pages, why is this tagging needed at all?
> >
> >  Because the buddy pointers may point to a page of the different kind. Then
> >  a merge is not possible.
>
> In that case I still don't understand, sorry.
>
> If each zone has two buddy lists, one for zeroed and one for not-zeroed,
> how can we ever get known-to-be-zeroed pages on the not-known-to-be-zeroed
> list or vice versa?

The buddy is calculated based on the position in the page struct array not
based on the list.

> >
> >   #define __free_page(page) __free_pages((page), 0)
> >   #define free_page(addr) free_pages((addr),0)
> >
> >  This is what you want right?
>
> Well, it was more a question that a request.  If we do this, does it speed
> anything up?

It will be able to manage the quicklist effectively and you can avoid
having to zero a page for pte/pmd/pud/pgds.

The main benefit from prezeroing is gained for programs that do numerical
calculations based on sparse matrices or other extremely large programs
that typically also come with large sparse arrays. The optimization is
typical for operating systems in that area (even M$ does that...).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> On Thu, 17 Mar 2005, Andrew Morton wrote:
> 
>  > > > It's hard to know what to think about this without benchmarking 
> numbers.
> 
>  http://oss.sgi.com/projects/page_fault_performance/

Oh no, not that page again ;)

Seems to say that prezeroing makes negligible difference to kernel builds,
but speeds up a big malloc+memset by 3x to 4x, yes?

Are there any real-worldish workloads which show an appreciable benefit?

The large speedup for a big memset seems odd - I assume it's simply
transferring CPU load from the user's process over to kscrubd.  Or is it
the fancy page-zeroing hardware?  How do we differentiate the two?

Are there any workloads which are seeing a benefit on a CPU which doesn't
have the zeroing hardware?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Where is a reference for ioctl32() usage?

2005-03-17 Thread Alan Kilian

Thanks for all the help in the past, and I'm once again knocking
at your door for more help.

I am trying to get my PCI bus device driver running on an Xeon 
64-bit FC-3 distribution.

I got the compiler warnings all cleaned up, the driver compiles and 
loads, but the test executable which was compiled on a 32-bit FC-3 
distribution is causing these messages in /var/log/messages:

Mar 17 15:42:55 noble kernel: ioctl32(boardtest:3730): 
Unknown cmd fd(3) cmd(8004440e){00} arg(d824) on /dev/sse0
Mar 17 15:42:55 noble kernel: ioctl32(boardtest:3730): 
Unknown cmd fd(3) cmd(8004440e){00} arg(d8c4) on /dev/sse0
Mar 17 15:42:55 noble kernel: ioctl32(boardtest:3730): 
Unknown cmd fd(3) cmd(40044414){00} arg() on /dev/sse0
Mar 17 15:42:55 noble kernel: ioctl32(boardtest:3730): 
Unknown cmd fd(3) cmd(80044403){00} arg(0804f780) on /dev/sse0

It's probably a simple thing to change my ioctl() interface in the
driver, but I googled myself blue in the face, and I didn't find it,
so I come to you, hat-in-hand for help.

Where can I find out how to change my driver so I can have a 32-bit
executable talk to it using ioctl()?

I did change the "type" argument in _IOR and _IOW to uint32_t from
int, but that didn't change things.

-Alan

-- 
- Alan Kilian 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> > And given that we have separate buddy structures for zeroed and not-zeroed
>  > pages, why is this tagging needed at all?
> 
>  Because the buddy pointers may point to a page of the different kind. Then
>  a merge is not possible.

In that case I still don't understand, sorry.

If each zone has two buddy lists, one for zeroed and one for not-zeroed,
how can we ever get known-to-be-zeroed pages on the not-known-to-be-zeroed
list or vice versa?

>  > These are all design decisions which have been made, but they're not
>  > communicated either in the patch description or in code comments.  It's to
>  > everyone's advantage to fix that, no?
> 
>  Of course. Try to do this ASAP. Testing a patch that defines the
>  following:
> 
>  Index: linux-2.6.11/include/linux/gfp.h
>  ===
>  --- linux-2.6.11.orig/include/linux/gfp.h   2005-03-01
>  23:37:50.0 -0800
>  +++ linux-2.6.11/include/linux/gfp.h2005-03-17 14:59:06.0
>  -0800
>  @@ -125,6 +125,8 @@ extern void FASTCALL(__free_pages(struct
>   extern void FASTCALL(free_pages(unsigned long addr, unsigned int order));
>   extern void FASTCALL(free_hot_page(struct page *page));
>   extern void FASTCALL(free_cold_page(struct page *page));
>  +extern void FASTCALL(free_hot_zeroed_page(struct page *page));
>  +extern void FASTCALL(free_cold_zeroed_page(struct page *page));
> 
>   #define __free_page(page) __free_pages((page), 0)
>   #define free_page(addr) free_pages((addr),0)
> 
>  This is what you want right?

Well, it was more a question that a request.  If we do this, does it speed
anything up?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux: detect application crash

2005-03-17 Thread Robert Hancock
Allison wrote:
Hi,
Several times when I worked with Windows, I have had a scenario when I
am editing a file and saved some time ago and then the application
crashes and I lose all recent data.
Can the operating system detect all application crashes ? If so, why
can't the OS save the user data to disk before the application quits ?
How does this work in Linux. I was curious if such a functionality
already exists in Linux. If not, what are the issues involved in
implementing this functionality.
The OS doesn't have enough information to be able to save the app's data 
in the event of a crash in a form that would be usable or meaningful, 
since only the app knows what format its data structures are in.

The app itself could do this (installing a signal handler for segfaults, 
etc.) but the problem is that whatever caused the program to crash may 
have also left its data in a messed-up state.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11] aoe [1/12]: remove too-low cap on minor number

2005-03-17 Thread Greg KH
I've applied 11 of these 12 patches (the one from Randy was already
included) to my trees.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Pre-approved Application for linux-kernel-announce@vger.kernel.org Thu, 17 Mar 2005 15:45:41 -0800

2005-03-17 Thread [EMAIL PROTECTED]
Hello,

We sent you an email a while ago, because you now qualify
for a much lower rate based on the biggest rate drop in years.

You can now get $327,000 for as little as $617 a month!
Bad credit? Doesn't matter, low rates are fixed no matter what!

Follow this link to process your application and a 24 hour approval:

http://www.alowerrate.net/?id=c77

Best Regards,
Augustus Felton


http://www.alowerrate.net/byebye.php
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] avoid signed vs unsigned comparison in efi_range_is_wc()

2005-03-17 Thread Jesper Juhl

This little function in include/linux/efi.h :

static inline int efi_range_is_wc(unsigned long start, unsigned long len)
{
int i;

for (i = 0; i < len; i += (1UL << EFI_PAGE_SHIFT)) {
unsigned long paddr = __pa(start + i);
if (!(efi_mem_attributes(paddr) & EFI_MEMORY_WC))
return 0;
}
/* The range checked out */
return 1;
}

generates this warning when building with gcc -W : 

include/linux/efi.h: In function `efi_range_is_wc':
include/linux/efi.h:320: warning: comparison between signed and unsigned

It looks to me like a significantly large 'len' passed in could cause the 
loop to never end. Isn't it safer to make 'i' an unsigned long as well? 
Like this little patch below (which of course also kills the warning) :


Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>

diff -up linux-2.6.11-mm4-orig/include/linux/efi.h 
linux-2.6.11-mm4/include/linux/efi.h
--- linux-2.6.11-mm4-orig/include/linux/efi.h   2005-03-16 15:45:35.0 
+0100
+++ linux-2.6.11-mm4/include/linux/efi.h2005-03-18 00:34:36.0 
+0100
@@ -315,7 +315,7 @@ extern struct efi_memory_map memmap;
  */
 static inline int efi_range_is_wc(unsigned long start, unsigned long len)
 {
-   int i;
+   unsigned long i;
 
for (i = 0; i < len; i += (1UL << EFI_PAGE_SHIFT)) {
unsigned long paddr = __pa(start + i);



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


8250 - sparse error fixes

2005-03-17 Thread Ben Dooks
Ensure __iomem on the correct bits of the serial_struct
and other definitions.

Remove the attempts to size zero length arrays, which
causes problems from sparse.

Signed-off-by: Ben Dooks <[EMAIL PROTECTED]>

diff -urN -X ../dontdiff linux-2.6.11.3-bk3/include/linux/serial.h 
linux-2.6.11.3-bk3-fix1/include/linux/serial.h
--- linux-2.6.11.3-bk3/include/linux/serial.h   2005-03-02 07:37:50.0 
+
+++ linux-2.6.11.3-bk3-fix1/include/linux/serial.h  2005-03-17 
23:08:53.0 +
@@ -45,7 +45,7 @@
int hub6;
unsigned short  closing_wait; /* time to wait before closing */
unsigned short  closing_wait2; /* no longer used... */
-   unsigned char   *iomem_base;
+   unsigned char __iomem *iomem_base;
unsigned short  iomem_reg_shift;
unsigned intport_high;
unsigned long   iomap_base; /* cookie passed into ioremap */
diff -urN -X ../dontdiff linux-2.6.11.3-bk3/drivers/serial/8250.c 
linux-2.6.11.3-bk3-fix1/drivers/serial/8250.c
--- linux-2.6.11.3-bk3/drivers/serial/8250.c2005-03-17 22:58:47.0 
+
+++ linux-2.6.11.3-bk3-fix1/drivers/serial/8250.c   2005-03-17 
23:23:39.0 +
@@ -111,15 +111,41 @@
  * standard enumeration mechanism.   Platforms that can find all
  * serial ports via mechanisms like ACPI or PCI need not supply it.
  */
-#ifndef SERIAL_PORT_DFNS
-#define SERIAL_PORT_DFNS
-#endif
 
+#ifdef SERIAL_PORT_DFNS
 static struct old_serial_port old_serial_port[] = {
SERIAL_PORT_DFNS /* defined in asm/serial.h */
 };
 
+static inline void __init serial8240_isa_init_asmdefs(void)
+{
+   struct uart_8250_port *up;
+   int i;
+
+   for (i = 0, up = serial8250_ports; i < ARRAY_SIZE(old_serial_port);
+i++, up++) {
+   up->port.iobase   = old_serial_port[i].port;
+   up->port.irq  = irq_canonicalize(old_serial_port[i].irq);
+   up->port.uartclk  = old_serial_port[i].baud_base * 16;
+   up->port.flags= old_serial_port[i].flags;
+   up->port.hub6 = old_serial_port[i].hub6;
+   up->port.membase  = old_serial_port[i].iomem_base;
+   up->port.iotype   = old_serial_port[i].io_type;
+   up->port.regshift = old_serial_port[i].iomem_reg_shift;
+   if (share_irqs)
+   up->port.flags |= UPF_SHARE_IRQ;
+   }
+}
+
 #define UART_NR(ARRAY_SIZE(old_serial_port) + 
CONFIG_SERIAL_8250_NR_UARTS)
+#else
+
+#define UART_NR(CONFIG_SERIAL_8250_NR_UARTS)
+
+static inline void __init serial8240_isa_init_asmdefs(void)
+{
+}
+#endif
 
 #ifdef CONFIG_SERIAL_8250_RSA
 
@@ -2021,9 +2047,9 @@
return;
first = 0;
 
-   for (i = 0; i < UART_NR; i++) {
-   struct uart_8250_port *up = _ports[i];
+   up = _ports[0];
 
+   for (i = 0; i < UART_NR; i++, up++) {
up->port.line = i;
spin_lock_init(>port.lock);
 
@@ -2039,19 +2065,7 @@
up->port.ops = _pops;
}
 
-   for (i = 0, up = serial8250_ports; i < ARRAY_SIZE(old_serial_port);
-i++, up++) {
-   up->port.iobase   = old_serial_port[i].port;
-   up->port.irq  = irq_canonicalize(old_serial_port[i].irq);
-   up->port.uartclk  = old_serial_port[i].baud_base * 16;
-   up->port.flags= old_serial_port[i].flags;
-   up->port.hub6 = old_serial_port[i].hub6;
-   up->port.membase  = old_serial_port[i].iomem_base;
-   up->port.iotype   = old_serial_port[i].io_type;
-   up->port.regshift = old_serial_port[i].iomem_reg_shift;
-   if (share_irqs)
-   up->port.flags |= UPF_SHARE_IRQ;
-   }
+   serial8240_isa_init_asmdefs();
 }
 
 static void __init
diff -urN -X ../dontdiff linux-2.6.11.3-bk3/drivers/serial/8250.h 
linux-2.6.11.3-bk3-fix1/drivers/serial/8250.h
--- linux-2.6.11.3-bk3/drivers/serial/8250.h2005-03-02 07:37:30.0 
+
+++ linux-2.6.11.3-bk3-fix1/drivers/serial/8250.h   2005-03-17 
23:07:10.0 +
@@ -30,7 +30,7 @@
unsigned int flags;
unsigned char hub6;
unsigned char io_type;
-   unsigned char *iomem_base;
+   unsigned char __iomem *iomem_base;
unsigned short iomem_reg_shift;
 };
 
diff -urN -X ../dontdiff linux-2.6.11.3-bk3/drivers/serial/serial_core.c 
linux-2.6.11.3-bk3-fix1/drivers/serial/serial_core.c
--- linux-2.6.11.3-bk3/drivers/serial/serial_core.c 2005-03-02 
07:37:50.0 +
+++ linux-2.6.11.3-bk3-fix1/drivers/serial/serial_core.c2005-03-17 
23:09:36.0 +
@@ -592,7 +592,7 @@
tmp.hub6= port->hub6;
tmp.io_type = port->iotype;
tmp.iomem_reg_shift = port->regshift;
-   tmp.iomem_base  = (void *)port->mapbase;
+   tmp.iomem_base  = (void __iomem *)port->mapbase;
 
if 

Re: [PATCH] Prezeroing V8

2005-03-17 Thread Christoph Lameter
On Thu, 17 Mar 2005, Andrew Morton wrote:

> > > It's hard to know what to think about this without benchmarking numbers.

http://oss.sgi.com/projects/page_fault_performance/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Christoph Lameter
On Thu, 17 Mar 2005, Andrew Morton wrote:

> OK, so we're splitting each zone's buddy structure into two: one for zeroed
> pages and one for not-zeroed pages, yes?

Right.

> It's not obvious what the page->private of freed pages are being used for.
> Please comment that.

Ok.

> What's all this (zero << 10) stuff?
>
> + page->private = order + (zero << 10);
> +   (page_zorder(page) == order + (zero << 10)) &&
>
> Doesn't this explode if we already have order-1024 pages in there?  I guess
> that's a reasonable restriction, but where did the "10" come from?
> Non-obvious, needs commenting.

Yes it will fail if we have pages of the size of 2^1036.

> And given that we have separate buddy structures for zeroed and not-zeroed
> pages, why is this tagging needed at all?

Because the buddy pointers may point to a page of the different kind. Then
a merge is not possible.

> These are all design decisions which have been made, but they're not
> communicated either in the patch description or in code comments.  It's to
> everyone's advantage to fix that, no?

Of course. Try to do this ASAP. Testing a patch that defines the
following:

Index: linux-2.6.11/include/linux/gfp.h
===
--- linux-2.6.11.orig/include/linux/gfp.h   2005-03-01
23:37:50.0 -0800
+++ linux-2.6.11/include/linux/gfp.h2005-03-17 14:59:06.0
-0800
@@ -125,6 +125,8 @@ extern void FASTCALL(__free_pages(struct
 extern void FASTCALL(free_pages(unsigned long addr, unsigned int order));
 extern void FASTCALL(free_hot_page(struct page *page));
 extern void FASTCALL(free_cold_page(struct page *page));
+extern void FASTCALL(free_hot_zeroed_page(struct page *page));
+extern void FASTCALL(free_cold_zeroed_page(struct page *page));

 #define __free_page(page) __free_pages((page), 0)
 #define free_page(addr) free_pages((addr),0)

This is what you want right?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] PCI-PCI transparent bridge handling improvements (pci core)

2005-03-17 Thread Dominik Brodowski
"Transparent" PCI-PCI bridges are currently "ignored" by the resource
management code in the PCI core. This means devices behind the bridge are
handled as if there was no bridge.

However, it seems more suitable -- and it seems to allow for proper
"prefetch"-type memory handling, too -- to handle a transparent PCI-PCI bridge 
like any other PCI-PCI bridge, and to only break out of the limits set by
the bridge windows if the resource allocation code determines it needs to 
do s.

The tricky part is in pci_find_parent_resource(). There are two types of
functions calling it: some functions already know the exact resource for
which they want to find the parent in order to properly insert it into the
resource database. This can be handled easily -- if the resource is inside
the bridge window, this is returned; if it isn't, the bridge's parent
resource is returned.

However, two callers (yenta_socket and i2o) intend something different: they
call pci_find_parent_resource() with an empty resource and want to find out
the biggest valid resource of the proper type in order to analyze it and
adapt its own hunger for resources to it. To keep this behaviour 
backwards-compatible, we always need to not limit it to the bridge window 
resources, but get back to the parent bus.


This patch is a modified and (hopefully) improved derivation of Linus' 
"pcmcia-bridge-resource-management-fix.patch" included in 2.6.11-rc4-mm1.


Signed-off-by: Dominik Brodowski <[EMAIL PROTECTED]>

Index: 2.6.11++/drivers/pci/bus.c
===
--- 2.6.11++.orig/drivers/pci/bus.c 2005-03-17 00:39:00.0 +0100
+++ 2.6.11++/drivers/pci/bus.c  2005-03-17 00:39:24.0 +0100
@@ -18,22 +18,12 @@
 #include "pci.h"
 
 /**
- * pci_bus_alloc_resource - allocate a resource from a parent bus
- * @bus: PCI bus
- * @res: resource to allocate
- * @size: size of resource to allocate
- * @align: alignment of resource to allocate
- * @min: minimum /proc/iomem address to allocate
- * @type_mask: IORESOURCE_* type flags
- * @alignf: resource alignment function
- * @alignf_data: data argument for resource alignment function
+ * pci_one_bus_alloc_resource - allocate a resource from one specific bus
  *
- * Given the PCI bus a device resides on, the size, minimum address,
- * alignment and type, try to find an acceptable resource allocation
- * for a specific device resource.
+ * Always use pci_bus_alloc_resource() described below.
  */
-int
-pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
+static int
+pci_one_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
unsigned long size, unsigned long align, unsigned long min,
unsigned int type_mask,
void (*alignf)(void *, struct resource *,
@@ -69,6 +59,48 @@
 }
 
 /**
+ * pci_bus_alloc_resource - allocate a resource from a parent bus
+ * @bus: PCI bus
+ * @res: resource to allocate
+ * @size: size of resource to allocate
+ * @align: alignment of resource to allocate
+ * @min: minimum /proc/iomem address to allocate
+ * @type_mask: IORESOURCE_* type flags
+ * @alignf: resource alignment function
+ * @alignf_data: data argument for resource alignment function
+ *
+ * Given the PCI bus a device resides on, the size, minimum address,
+ * alignment and type, try to find an acceptable resource allocation
+ * for a specific device resource.
+ */
+int
+pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
+   unsigned long size, unsigned long align, unsigned long min,
+   unsigned int type_mask,
+   void (*alignf)(void *, struct resource *,
+   unsigned long, unsigned long),
+   void *alignf_data)
+{
+   int ret = pci_one_bus_alloc_resource(bus, res, size, align, min,
+   type_mask, alignf, alignf_data);
+
+   /*
+* If allocation from the resources available to this bus failed,
+* and there is a transparent parent PCI-PCI bridge, we can check
+* the resources of the parent bus as well
+*/
+   while (ret && bus->self && bus->self->transparent) {
+   bus = bus->self->bus;
+   if (!bus)
+   return ret;
+   ret = pci_one_bus_alloc_resource(bus, res, size, align, min,
+   type_mask, alignf, alignf_data);
+   }
+   return ret;
+}
+
+
+/**
  * add a single device
  * @dev: device to add
  *
Index: 2.6.11++/drivers/pci/pci.c
===
--- 2.6.11++.orig/drivers/pci/pci.c 2005-03-17 00:39:00.0 +0100
+++ 2.6.11++/drivers/pci/pci.c  2005-03-17 01:12:18.0 +0100
@@ -195,18 +195,13 @@
 }
 
 /**
- * pci_find_parent_resource - return resource region of parent bus of given 
region
- * @dev: PCI device structure contains resources to be searched
- * @res: child resource record for which parent is sought
+ * 

[PATCH 2/2] PCI-PCI transparent bridge handling improvements (yenta_socket)

2005-03-17 Thread Dominik Brodowski
As a follow-up, we can make yenta_socket try harder to limit itself to the
parent bridge windows. This is done by lowering the
PCIBIOS_MIN_CARDBUS_IO and by updating yenta_allocate_res(). It now tries at
first to get resources within the bridge windows, and if they are large
enough (>=BRIDGE_{IO,MEM}_ACC), these are used. If no or only too small
resources were found, it falls back to the resources behind the parent PCI
bridge if this is "transparent". Using this patch may result in such "funny"
/proc/ioports as:

2800-28ff : PCI CardBus #07
3000-3fff : PCI Bus #02
  3000-303f : :02:08.0
3000-303f : e100
  3400-34ff : PCI CardBus #03
  3800-38ff : PCI CardBus #03
  3c00-3cff : PCI CardBus #07

There weren't enough properly aligned ports available inside PCI Bus #02 to
stuff all four (2x2) IO windows into it, so one was taken outside the
transparent PCI bridge ioport window.

Signed-off-by: Dominik Brodowski <[EMAIL PROTECTED]>

Index: 2.6.11++/drivers/pcmcia/yenta_socket.c
===
--- 2.6.11++.orig/drivers/pcmcia/yenta_socket.c 2005-03-17 23:13:58.0 
+0100
+++ 2.6.11++/drivers/pcmcia/yenta_socket.c  2005-03-17 23:40:38.0 
+0100
@@ -518,19 +518,23 @@
  * Use an adaptive allocation for the memory resource,
  * sometimes the memory behind pci bridges is limited:
  * 1/8 of the size of the io window of the parent.
- * max 4 MB, min 16 kB.
+ * max 4 MB, min 16 kB. We try very hard to not get
+ * below the "ACC" values, though.
  */
 #define BRIDGE_MEM_MAX 4*1024*1024
+#define BRIDGE_MEM_ACC 128*1024
 #define BRIDGE_MEM_MIN 16*1024
 
 #define BRIDGE_IO_MAX 256
+#define BRIDGE_IO_ACC 256
 #define BRIDGE_IO_MIN 32
 
 #ifndef PCIBIOS_MIN_CARDBUS_IO
 #define PCIBIOS_MIN_CARDBUS_IO PCIBIOS_MIN_IO
 #endif
 
-static void yenta_allocate_res(struct yenta_socket *socket, int nr, unsigned 
type)
+static int yenta_try_allocate_res(struct yenta_socket *socket, int nr,
+ unsigned int type, unsigned int run)
 {
struct pci_bus *bus;
struct resource *root, *res;
@@ -550,11 +554,11 @@
res->name = bus->name;
res->flags = type;
res->start = 0;
-   res->end = 0;
+   res->end = run;
root = pci_find_parent_resource(socket->dev, res);
 
if (!root)
-   return;
+   return -ENODEV;
 
start = config_readl(socket, offset) & mask;
end = config_readl(socket, offset+4) | ~mask;
@@ -562,7 +566,8 @@
res->start = start;
res->end = end;
if (request_resource(root, res) == 0)
-   return;
+   return 0;
+
printk(KERN_INFO "yenta %s: Preassigned resource %d busy, 
reconfiguring...\n",
pci_name(socket->dev), nr);
res->start = res->end = 0;
@@ -571,12 +576,12 @@
if (type & IORESOURCE_IO) {
align = 1024;
size = BRIDGE_IO_MAX;
-   min = BRIDGE_IO_MIN;
+   min = run ? BRIDGE_IO_ACC : BRIDGE_IO_MIN;
start = PCIBIOS_MIN_CARDBUS_IO;
end = ~0U;
} else {
unsigned long avail = root->end - root->start;
-   int i;
+   u32 i;
size = BRIDGE_MEM_MAX;
if (size > avail/8) {
size=(avail+1)/8;
@@ -586,26 +591,36 @@
i++;
size = 1 << i;
}
-   if (size < BRIDGE_MEM_MIN)
-   size = BRIDGE_MEM_MIN;
+   i = run ? BRIDGE_MEM_ACC : BRIDGE_MEM_MIN;
+   if (size < i)
+   size = i;
min = BRIDGE_MEM_MIN;
align = size;
start = PCIBIOS_MIN_MEM;
end = ~0U;
}
-   
+
do {
if (allocate_resource(root, res, size, start, end, align, NULL, 
NULL)==0) {
config_writel(socket, offset, res->start);
config_writel(socket, offset+4, res->end);
-   return;
+   return 0;
}
size = size/2;
align = size;
} while (size >= min);
+
+   return -ENODEV;
+}
+
+static void yenta_allocate_res(struct yenta_socket *socket, int nr, unsigned 
type)
+{
+   if (!(yenta_try_allocate_res(socket, nr, type, 1)) ||
+   !(yenta_try_allocate_res(socket, nr, type, 0)))
+   return;
+
printk(KERN_INFO "yenta %s: no resource of type %x available, trying to 
continue...\n",
pci_name(socket->dev), type);
-   res->start = res->end = 0;
 }
 
 /*
@@ -616,7 +631,7 @@
yenta_allocate_res(socket, 0, IORESOURCE_MEM|IORESOURCE_PREFETCH);
yenta_allocate_res(socket, 1, IORESOURCE_MEM);

Re: [PATCH] pci_ids.h correction for Intel ICH7M - 2.6.11

2005-03-17 Thread Greg KH
On Fri, Mar 04, 2005 at 06:04:43PM -0800, Jason Gaston wrote:
> This patch corrects the ICH7M LPC controller DID in pci_ids.h from
> x27B1 to x27B9. ?This patch was build against 2.6.11.
> If acceptable, please apply.

Applied, thanks.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> On Thu, 17 Mar 2005, Andrew Morton wrote:
> 
> > Christoph Lameter <[EMAIL PROTECTED]> wrote:
> > >
> > > Adds management of ZEROED and NOT_ZEROED pages and a background daemon
> > > called scrubd. /proc/sys/vm/scrubd_load, /proc/sys/vm_scrubd_start and
> > > /proc/sys/vm_scrubd_stop control the scrub daemon. See Documentation/vm/
> > > scrubd.txt
> >
> > It's hard to know what to think about this without benchmarking numbers.

?

> >
> > It would help if you could briefly describe the implementation and design
> > decisions when sending patches.
> 
> Oh. This was discussed so many times that I thought it would not be
> necessary anymore. The discussion is attached.

Add it to the changelog and maintain it, please.  It never hurts.

But that only describes why we want the feature, which is nice.  It's also
useful to explain how the feature works.  Although my preference there is
that this be done within code comments if at all appropriate.



OK, so we're splitting each zone's buddy structure into two: one for zeroed
pages and one for not-zeroed pages, yes?

It's not obvious what the page->private of freed pages are being used for. 
Please comment that.

What's all this (zero << 10) stuff?

+   page->private = order + (zero << 10);
+   (page_zorder(page) == order + (zero << 10)) &&

Doesn't this explode if we already have order-1024 pages in there?  I guess
that's a reasonable restriction, but where did the "10" come from? 
Non-obvious, needs commenting.

And given that we have separate buddy structures for zeroed and not-zeroed
pages, why is this tagging needed at all?


These are all design decisions which have been made, but they're not
communicated either in the patch description or in code comments.  It's to
everyone's advantage to fix that, no?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [openib-general] [PATCH] Add PCI device ID for new Mellanox HCA

2005-03-17 Thread Greg KH
On Tue, Mar 01, 2005 at 08:42:47AM -0800, Roland Dreier wrote:
> Hi Greg,
> 
> It turns out that Mellanox decided to change the device ID at the last
> minute.  So of course there will be parts with both IDs.  Here's an
> updated patch that includes both IDs.  Please use this instead.

Applied, thanks.

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] add TIMEOUT to firmware_class hotplug event

2005-03-17 Thread Greg KH
On Thu, Mar 17, 2005 at 12:07:55PM +0100, Kay Sievers wrote:
> On Wed, 2005-03-16 at 21:46 -0800, Greg KH wrote:
> > On Thu, Mar 17, 2005 at 03:34:31AM +0100, Kay Sievers wrote:
> > > On Tue, 2005-03-15 at 09:25 +0100, Hannes Reinecke wrote:
> > > > The current implementation of the firmware class breaks a fundamental
> > > > assumption in udevd: that the physical device can be initialised fully
> > > > prior to executing the next event for that device.
> > > 
> > > Here we add a TIMEOUT value to the hotplug environment of the firmware
> > > requesting event. I will adapt udevd not to wait for anything else, if
> > > it finds a TIMEOUT key.
> > 
> > Can't you just trigger off of the FIRMWARE variable instead?
> 
> Sure, that will work too. I just thought it would be nice to give
> userspace a hint about the event behavior the kernel expects, instead of
> adding an exception to the udevd event management?

Hm, so by adding the TIMEOUT value, we are telling userspace that we
better act on this operation soon, right?  That's a special case too :)

Anyway, sure, this is fine, I'll go add this to the driver-bk tree.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Christoph Lameter
On Thu, 17 Mar 2005, Nish Aravamudan wrote:

> > +   if (system_state != SYSTEM_RUNNING)
> > +   return;
> > +
> > +while (avenrun[0] >= ((unsigned long)sysctl_scrub_load << FSHIFT))
> > +   schedule_timeout(30*HZ);
>
> This is a busy-loop, unless you set the state before you call
> schedule_timeout(). Additionally, you really want to sleep 30 seconds

Ahh. Missed that thanks.

> at a time? Please use msleep() or msleep_interruptible(), unless you
> expect wait-queue events.

I want to sleep 30 seconds because the system load is unlikely to change
frequently.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Nish Aravamudan
On Thu, 17 Mar 2005 13:43:47 -0800 (PST), Christoph Lameter
<[EMAIL PROTECTED]> wrote:
> Changelog:
> - Drop clear_pages and the approach to zero pages of higher order
>   first
> - Zero a percentage of pages from all orders to avoid fragmentation
> 
> Adds management of ZEROED and NOT_ZEROED pages and a background daemon
> called scrubd. /proc/sys/vm/scrubd_load, /proc/sys/vm_scrubd_start and
> /proc/sys/vm_scrubd_stop control the scrub daemon. See Documentation/vm/
> scrubd.txt
> 
> In an SMP environment the scrub daemon is typically running on the most
> idle cpu. Thus a single threaded application running
> on one cpu may have the other cpu zeroing pages for it etc. The scrub
> daemon is hardly noticable and usually finishes zeroing quickly since
> most processors are optimized for linear memory filling.
> 
> Patch against 2.6.11.3-bk3
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 



> Index: linux-2.6.11/mm/scrubd.c
> ===
> --- /dev/null   1970-01-01 00:00:00.0 +
> +++ linux-2.6.11/mm/scrubd.c2005-03-17 13:12:23.0 -0800



> +/*
> + * scrub_pgdat() will work across all this node's zones.
> + */
> +static void scrub_pgdat(pg_data_t *pgdat)
> +{
> +   int i;
> +
> +   if (system_state != SYSTEM_RUNNING)
> +   return;
> +
> +while (avenrun[0] >= ((unsigned long)sysctl_scrub_load << FSHIFT))
> +   schedule_timeout(30*HZ);

This is a busy-loop, unless you set the state before you call
schedule_timeout(). Additionally, you really want to sleep 30 seconds
at a time? Please use msleep() or msleep_interruptible(), unless you
expect wait-queue events.

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Benjamin Herrenschmidt

> On a fatal error the interface is down.  No matter what the driver
> supports (AER aware, EEH aware, unaware) all IO is likely to fail.
> Resetting a bus in a point-to-point environment like PCI Express or EEH
> (as you describe) should have little adverse effect.  The risk is the
> bus reset will cause a card reset and the driver must understand to
> re-initialize the card.  A link reset in PCI Express will not cause a
> card reset.  We assume the driver will reset its card if necessary.

Does the link side of PCIE provides a way to trigger a hard reset of the
rest of the card ? If not, then it's dodgy as there may be no way to
consistently "reset" the card if it's in a bad state. I have to double
check, but I suspect that IBM's implementation of EEH-compliant PCIE
will add a full hard reset not just a link reset.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KGDB question

2005-03-17 Thread Matt Mackall
On Thu, Mar 17, 2005 at 02:29:58PM -0800, Andrew Morton wrote:
> Jesse Barnes <[EMAIL PROTECTED]> wrote:
> >
> > > kgdb patches are maintained in -mm kernels.
> > >
> > > Patches are in
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11
> > >-mm1/broken-out/*kgdb*
> > >
> > > And the patch application order is described in
> > >
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11
> > >-mm1/patch-series -
> > 
> > What's the latest status on these?  Last I heard, some cleanup was going to 
> > happen to make kgdb suitable for the mainline, did that ever happen?
> 
> It part-happened, then the effort seemed to die.
> 
> >  Also, 
> > it would be nice if I could connect to a remote kernel running the kgdb 
> > stubs 
> > w/o having to run gdb on the same ethernet segment.  Would that be 
> > difficult 
> > to fix?
> 
> 
> 
> Maybe we'd have to teach kgdboe to arp for the remote debug host.  I think
> Matt was talking about that a while back.
> 
> 
> 
> If switches send the destination MAC address through unchanged then maybe
> the problem is that the switch simply doesn't know the MAC address of the
> remote debug host yet?  If the switch has its own MAC address (it doesn't,
> does it), or if it's actually a router then perhaps you should specify the
> router's MAC address and not the remote debug host's.

I haven't tried this, but I believe you need to set up kgdboe's
destination MAC address as the MAC of the next IP hop. Switches should
be invisible to kgdboe.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[no subject]

2005-03-17 Thread Martin xyz

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: binfmt_elf padzero problems

2005-03-17 Thread Andrew Morton
Nir Tzachar <[EMAIL PROTECTED]> wrote:
>
> hello.
> 
> i am seeing a problem(?) with the patch described at:
> http://marc.theaimsgroup.com/?l=linux-kernel=109865760703851=2
> i'm using vanilla 2.6.11 (not .1/.2/.3/.4 ...)
> 
> the short version:
> padzero does not alway do the right thing (more correctly, it's caller,
> load_elf_binary).
>  
> the longer version:
> 
> padzero calls clear_user. clear_user first checks if the address passed
> is writable. if it is not, an error is returned. 
> the problem manifest itself when the area being cleared is not
> writable... this should not normally happen in the context of
> load_elf_binary, however it _can_ happen with the following assembly
> code (intel syntax):
> 
> section .text
> global _start
> _start:
> mov eax,0x1
> mov ebx,0x0
> int 0x80
> hlt
> 
> assembled with nasm -f elf, produces a binary with a bss segment of zero
> size, aligned to 1, and one program header.
> now, the when calling padzero, elf_bss holds an address which belongs
> to .text (since no (fake)program header for .bss wad created), i.e; not
> writable
> when padzero is called, it tries to clean the rest of the .text section,
> which clearly results with an error.
> 
> thus, my (very) small binary always segfaults under 2.6.11+ 
> 
> on the other hand, i can be dead wrong.. if so, id like to know why...
> 

Tricky.

I guess if the bss has zero length then we can skip the zeroing of the end
of the page at the end of bss, as long as we're dead sure that we didn't
accidentally instantiate a single page on behalf of that zero-length bss.

Something like this, perhaps?


--- 25/fs/binfmt_elf.c~aThu Mar 17 14:47:35 2005
+++ 25-akpm/fs/binfmt_elf.c Thu Mar 17 14:48:44 2005
@@ -907,15 +907,17 @@ static int load_elf_binary(struct linux_
 * mapping in the interpreter, to make sure it doesn't wind
 * up getting placed where the bss needs to go.
 */
-   retval = set_brk(elf_bss, elf_brk);
-   if (retval) {
-   send_sig(SIGKILL, current, 0);
-   goto out_free_dentry;
-   }
-   if (padzero(elf_bss)) {
-   send_sig(SIGSEGV, current, 0);
-   retval = -EFAULT; /* Nobody gets to see this, but.. */
-   goto out_free_dentry;
+   if (likely(elf_bss != elf_brk)) {   /* Is there any bss at all? */
+   retval = set_brk(elf_bss, elf_brk);
+   if (retval) {
+   send_sig(SIGKILL, current, 0);
+   goto out_free_dentry;
+   }
+   if (padzero(elf_bss)) {
+   send_sig(SIGSEGV, current, 0);
+   retval = -EFAULT; /* Nobody gets to see this, but.. */
+   goto out_free_dentry;
+   }
}
 
if (elf_interpreter) {
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Business Proposal sdim

2005-03-17 Thread mlewiin
Good day Sir/Madam,

My name is Micheal Lewin,  I am representing a group of business men who deal 
in raw materials and other exports into Canada, America and Europe. We are 
searching for 
representatives who can help us establish a medium of getting to our customers 
in these countries as 
well as making there payments through you to us.
If you are interested in transacting business with us, we will be very glad. 
Please Contact us Subject to 
your satisfaction, you will be given the opportunity to negotiate your terms of 
which we will
Pay for your services as our representative.
If you are interested, kindly forward to us your:
1. Your full names.
2. Your full postal and mailing address.
3. Your Contact telephone and fax numbers

Faithfully yours,
I remain.

Mr chi  hang
Mr Micheal Lewin
Secretary.
China Metallurgical Import & Export Henan Company
(CMIEC HN)
www.cmiec.com



jibsptduhrlekrtibtpdhkowtj
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KGDB question

2005-03-17 Thread Andrew Morton
Jesse Barnes <[EMAIL PROTECTED]> wrote:
>
> > kgdb patches are maintained in -mm kernels.
> >
> > Patches are in
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11
> >-mm1/broken-out/*kgdb*
> >
> > And the patch application order is described in
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11
> >-mm1/patch-series -
> 
> What's the latest status on these?  Last I heard, some cleanup was going to 
> happen to make kgdb suitable for the mainline, did that ever happen?

It part-happened, then the effort seemed to die.

>  Also, 
> it would be nice if I could connect to a remote kernel running the kgdb stubs 
> w/o having to run gdb on the same ethernet segment.  Would that be difficult 
> to fix?



Maybe we'd have to teach kgdboe to arp for the remote debug host.  I think
Matt was talking about that a while back.



If switches send the destination MAC address through unchanged then maybe
the problem is that the switch simply doesn't know the MAC address of the
remote debug host yet?  If the switch has its own MAC address (it doesn't,
does it), or if it's actually a router then perhaps you should specify the
router's MAC address and not the remote debug host's.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Christoph Lameter
On Thu, 17 Mar 2005, Andrew Morton wrote:

> Christoph Lameter <[EMAIL PROTECTED]> wrote:
> >
> > Adds management of ZEROED and NOT_ZEROED pages and a background daemon
> > called scrubd. /proc/sys/vm/scrubd_load, /proc/sys/vm_scrubd_start and
> > /proc/sys/vm_scrubd_stop control the scrub daemon. See Documentation/vm/
> > scrubd.txt
>
> It's hard to know what to think about this without benchmarking numbers.
>
> It would help if you could briefly describe the implementation and design
> decisions when sending patches.

Oh. This was discussed so many times that I thought it would not be
necessary anymore. The discussion is attached.

> For example, one area where we could use this is in pagetable management,
> where we need zeroed pages and we tend to free up known-to-be-zero and
> probably cache-warm pages.  Right now some architectures are maintaining
> their own quicklists, or using a slab cache, both of which are suboptimal.

Right.

> But afaict the patch doesn't differentiate between cache-cold and cache-hot
> zeroed pages, and doesn't have an API with which clients can free up a
> known-to-be-zero page.

end_zero_page(page, 0) would do put a zeroed page back on the zeroed list.
But we may have to define a cleaner API for it. Plus this is a hot zero
page. So I would need to add a hot zero hotlist to the existing cold zero
hotlist.

 Description 

The most expensive operation in the page fault handler is (apart of SMP
locking overhead) the touching of all cache lines of a page by
zeroing the page. This zeroing means that all cachelines of the faulted
page (on Altix that means all 128 cachelines of 128 byte each) must be
handled and later written back. This patch allows to avoid having to
use all cachelines  if only a part of the cachelines of that page is
needed immediately after the fault. Doing so will only be effective for
sparsely accessed memory which is typical for anonymous memory and pte
maps.

The patch makes prezeroing very effective by also allowing the use
of hardware support for offloading zeroing from the cpu. This avoids
the invalidation of the cpu caches by extensive zeroing operations.

The scrub daemon is invoked when the number of zeroed pages falls below a
lower threshhold (defined by setting /proc/sys/vm/scrub_start) so
that its worth running it. kscrubd then zeroes free pages until the upper
threshold is reached (set by /proc/sys/vm/scrub_stop). The zeroing
is performed on a percentage of pages at each order of freed pages.

kscrubd performs short bursts of zeroing when needed and tries to stay out
off the processor as much as possible. Kscrubd will only run when the load
is less than set in /proc/sys/vm/scrub_load (defaults to 1).

The benefits of prezeroing are reduced to minimal quantities if all
cachelines of a page are touched. Prezeroing can only be effective
if the whole page is not immediately used after the page fault.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vm_dirty_ratio seems a bit large.

2005-03-17 Thread Peter Chubb
> "Andrew" == Andrew Morton <[EMAIL PROTECTED]> writes:

Andrew> Robin Holt <[EMAIL PROTECTED]> wrote:

>>  One other issue we have is the vm_dirty_ratio and background_ratio
>> adjustments are a little coarse with these memory sizes.  Since our
>> minimum adjustment is 1%, we are adjusting by 40GB on the largest
>> configuration from above.  The hardware we are shipping today is
>> capable of going to far greater amounts of memory, but we don't
>> have customers demanding that yet.  I would like to plan ahead for
>> that and change vm_dirty_ratio from a straight percent into a
>> millipercent (thousandth of a percent).  Would that type of change
>> be acceptable?

Andrew> Oh drat.  I think such a change would require a new set of
Andrew> /proc entries.  

No, you could just extend them to understand fixed point.  Keep
printing integers as integers, print non-integers with one (or two:
will we ever need 0.01% increments?) decimal places.

-- 
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
The technical we do immediately,  the political takes *forever*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH libata-dev-2.6 04/05] libata: support descriptor sense in ctrl page

2005-03-17 Thread Brett Russ
04_libata_control_pg_desc_bit.patch

libata must support the descriptor format sense blocks as they
are required to properly report results of ATA pass through
commands as well as other SCSI commands reporting 48b LBAs.
This patch adjusts the control mode page to properly report
this.

Signed-off-by: Brett Russ <[EMAIL PROTECTED]>

 libata-scsi.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletion(-)

Index: libata-dev-2.6/drivers/scsi/libata-scsi.c
===
--- libata-dev-2.6.orig/drivers/scsi/libata-scsi.c  2005-03-17 
17:16:58.0 -0500
+++ libata-dev-2.6/drivers/scsi/libata-scsi.c   2005-03-17 17:16:58.0 
-0500
@@ -1370,7 +1370,12 @@ static unsigned int ata_msense_caching(u
 
 static unsigned int ata_msense_ctl_mode(u8 **ptr_io, const u8 *last)
 {
-   const u8 page[] = {0xa, 0xa, 2, 0, 0, 0, 0, 0, 0xff, 0xff, 0, 30};
+   const u8 page[] = {0xa, 0xa, 6, 0, 0, 0, 0, 0, 0xff, 0xff, 0, 30};
+
+   /* byte 2: set the descriptor format sense data bit (bit 2)
+* since we need to support returning this format for SAT
+* commands and any SCSI commands against a 48b LBA device.
+*/
 
ata_msense_push(ptr_io, last, page, sizeof(page));
return sizeof(page);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH libata-dev-2.6 01/05] libata: AHCI tf_read() support

2005-03-17 Thread Brett Russ
01_libata_garzik-ahci-tf-read.patch

(included in libata-2.6) This is Jeff's tf_read() support
patch for AHCI.

Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

 ahci.c |   11 +++
 1 files changed, 11 insertions(+)

Index: libata-dev-2.6/drivers/scsi/ahci.c
===
--- libata-dev-2.6.orig/drivers/scsi/ahci.c 2005-03-17 12:36:29.0 
-0500
+++ libata-dev-2.6/drivers/scsi/ahci.c  2005-03-17 17:16:57.0 -0500
@@ -179,6 +179,7 @@ static void ahci_eng_timeout(struct ata_
 static int ahci_port_start(struct ata_port *ap);
 static void ahci_port_stop(struct ata_port *ap);
 static void ahci_host_stop(struct ata_host_set *host_set);
+static void ahci_tf_read(struct ata_port *ap, struct ata_taskfile *tf);
 static void ahci_qc_prep(struct ata_queued_cmd *qc);
 static u8 ahci_check_status(struct ata_port *ap);
 static u8 ahci_check_err(struct ata_port *ap);
@@ -213,6 +214,8 @@ static struct ata_port_operations ahci_o
.check_err  = ahci_check_err,
.dev_select = ata_noop_dev_select,
 
+   .tf_read= ahci_tf_read,
+
.phy_reset  = ahci_phy_reset,
 
.qc_prep= ahci_qc_prep,
@@ -466,6 +469,14 @@ static u8 ahci_check_err(struct ata_port
return (readl(mmio + PORT_TFDATA) >> 8) & 0xFF;
 }
 
+static void ahci_tf_read(struct ata_port *ap, struct ata_taskfile *tf)
+{
+   struct ahci_port_priv *pp = ap->private_data;
+   u8 *d2h_fis = pp->rx_fis + RX_FIS_D2H_REG;
+
+   ata_tf_from_fis(d2h_fis, tf);
+}
+
 static void ahci_fill_sg(struct ata_queued_cmd *qc)
 {
struct ahci_port_priv *pp = qc->ap->private_data;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH libata-dev-2.6 05/05] libata: rework how CCs generated

2005-03-17 Thread Brett Russ
05_libata_split_ata_to_sense_error.patch

This patch fixes several bugs as well as reorganizes the way
check conditions are generated.  Bugs fixed: 1) in
ata_scsi_qc_complete(), ATA_12/16 commands wouldn't call
ata_pass_thru_cc() on error status; 2) ata_pass_thru_cc()
wouldn't put the SK, ASC, and ASCQ from ata_to_sense_error()
in the correct place in the sense block because
ata_to_sense_error() was writing a fixed sense block.

Per the recommendations in the comments, ata_to_sense_error()
is now split into 3 parts.  The existing fcn is only used for
outputting a sense key/ASC/ASCQ triplicate.  A new function
ata_dump_status() was created to print the error info, similar
to the ide variety.  A third function ata_gen_fixed_sense()
was created to generate a fixed length sense block.  I added
the use of the info field for 28b LBAs only.
ata_pass_thru_cc() renamed to ata_gen_ata_desc_sense() to
match naming convention, presumably to include another
descriptor format function in the future (see question 2
below).

Questions:

1) I made the ata_gen_..._sense() routines read the status
   register themselves rather than use the drv_stat values
   that used to be passed in?  These values seemed
   unreliable/useless since they were often hard coded (see
   calls to ata_qc_complete() for origins of most drv_stat
   variables).  Sound ok?

2) the SAT spec has little about error handling and sense
   information, sepcifically what descriptor format is valid
   for use by SAT commands.  I want to use descriptor type 00
   (information) in my next patch until a spec says
   differently.  Sound ok?

Signed-off-by: Brett Russ <[EMAIL PROTECTED]>

 libata-scsi.c |  342 +-
 libata.h  |1 
 2 files changed, 197 insertions(+), 146 deletions(-)

Index: libata-dev-2.6/drivers/scsi/libata-scsi.c
===
--- libata-dev-2.6.orig/drivers/scsi/libata-scsi.c  2005-03-17 
17:16:58.0 -0500
+++ libata-dev-2.6/drivers/scsi/libata-scsi.c   2005-03-17 17:16:59.0 
-0500
@@ -331,24 +331,69 @@ struct ata_queued_cmd *ata_scsi_qc_new(s
 }
 
 /**
+ * ata_dump_status - user friendly display of error info
+ * @id: id of the port in question
+ * @tf: ptr to filled out taskfile
+ *
+ * Decode and dump the ATA error/status registers for the user so
+ * that they have some idea what really happened at the non
+ * make-believe layer.
+ *
+ * LOCKING:
+ * inherited from caller
+ */
+void ata_dump_status(unsigned id, struct ata_taskfile *tf)
+{
+   u8 stat = tf->command, err = tf->feature;
+
+   printk(KERN_WARNING "ata%u: status=0x%02x { ", id, stat);
+   if (stat & ATA_BUSY) {
+   printk("Busy }\n"); /* Data is not valid in this case */
+   } else {
+   if (stat & 0x40)printk("DriveReady ");
+   if (stat & 0x20)printk("DeviceFault ");
+   if (stat & 0x10)printk("SeekComplete ");
+   if (stat & 0x08)printk("DataRequest ");
+   if (stat & 0x04)printk("CorrectedError ");
+   if (stat & 0x02)printk("Index ");
+   if (stat & 0x01)printk("Error ");
+   printk("}\n");
+
+   if (err) {
+   printk(KERN_WARNING "ata%u: error=0x%02x { ", id, err);
+   if (err & 0x04) printk("DriveStatusError ");
+   if (err & 0x80) {
+   if (err & 0x04) printk("BadCRC ");
+   else printk("Sector ");
+   }
+   if (err & 0x40) printk("UncorrectableError ");
+   if (err & 0x10) printk("SectorIdNotFound ");
+   if (err & 0x02) printk("TrackZeroNotFound ");
+   if (err & 0x01) printk("AddrMarkNotFound ");
+   printk("}\n");
+   }
+   }
+}
+
+/**
  * ata_to_sense_error - convert ATA error to SCSI error
- * @qc: Command that we are erroring out
  * @drv_stat: value contained in ATA status register
+ * @drv_err: value contained in ATA error register
+ * @sk: the sense key we'll fill out
+ * @asc: the additional sense code we'll fill out
+ * @ascq: the additional sense code qualifier we'll fill out
  *
- * Converts an ATA error into a SCSI error. While we are at it
- * we decode and dump the ATA error for the user so that they
- * have some idea what really happened at the non make-believe
- * layer.
+ * Converts an ATA error into a SCSI error.  Fill out 

Re: [PATCH libata-dev-2.6 03/05] libata: update ATA PT sense desc code

2005-03-17 Thread Brett Russ
03_libata_update_desc_code.patch

Change the ATA pass through sense block descriptor code to
0x09 per SAT

Signed-off-by: Brett Russ <[EMAIL PROTECTED]>

 libata-scsi.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: libata-dev-2.6/drivers/scsi/libata-scsi.c
===
--- libata-dev-2.6.orig/drivers/scsi/libata-scsi.c  2005-03-08 
08:47:48.0 -0500
+++ libata-dev-2.6/drivers/scsi/libata-scsi.c   2005-03-17 17:16:58.0 
-0500
@@ -531,7 +531,7 @@ void ata_pass_thru_cc(struct ata_queued_
 */
sb[0] = 0x72 ;
 
-   desc[0] = 0x8e ;/* TODO: replace with official value. */
+   desc[0] = 0x09;
 
/*
 * Set length of additional sense data.
@@ -2059,7 +2059,7 @@ void ata_scsi_simulate(u16 *id,
ata_scsi_rbuf_fill(, ata_scsiop_report_luns);
break;
 
-   /* mandantory commands we haven't implemented yet */
+   /* mandatory commands we haven't implemented yet */
case REQUEST_SENSE:
 
/* all other commands */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH libata-dev-2.6 02/05] libata: AHCI error handling fix

2005-03-17 Thread Brett Russ
02_libata_ahci-err-int.patch

(included in libata-2.6) Fixes AHCI bits during handling of
fatal error int.

Signed-off-by: Brett Russ <[EMAIL PROTECTED]>

 ahci.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: libata-dev-2.6/drivers/scsi/ahci.c
===
--- libata-dev-2.6.orig/drivers/scsi/ahci.c 2005-03-17 17:16:57.0 
-0500
+++ libata-dev-2.6/drivers/scsi/ahci.c  2005-03-17 17:16:57.0 -0500
@@ -548,7 +548,7 @@ static void ahci_intr_error(struct ata_p
 
/* stop DMA */
tmp = readl(port_mmio + PORT_CMD);
-   tmp &= PORT_CMD_START | PORT_CMD_FIS_RX;
+   tmp &= ~PORT_CMD_START;
writel(tmp, port_mmio + PORT_CMD);
 
/* wait for engine to stop.  TODO: this could be
@@ -580,7 +580,7 @@ static void ahci_intr_error(struct ata_p
 
/* re-start DMA */
tmp = readl(port_mmio + PORT_CMD);
-   tmp |= PORT_CMD_START | PORT_CMD_FIS_RX;
+   tmp |= PORT_CMD_START;
writel(tmp, port_mmio + PORT_CMD);
readl(port_mmio + PORT_CMD); /* flush */
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH libata-dev-2.6 00/05] libata: scsi error handling improvements

2005-03-17 Thread Brett Russ
This patch series attempts to clean up the SCSI error handling a bit.
See comments in below TOC or patch emails.  All of the below have been
tested in success and error paths through the VERIFY_10 and ATA_16
commands using the AHCI driver.

IMPORTANT: the patchset below against libata-dev-2.6 relies on the
recent AHCI driver fixes recently patched into libata-2.6.  I am
including the two specific patches as 1 and 2 of this series for
completeness, although of course they should be merged from libata-2.6
instead.  Therefore, you may ignore these two unless you want to test
this series now on libata-dev.

[ Start of patch descriptions ]

01_libata_garzik-ahci-tf-read.patch
: AHCI tf_read() support

(included in libata-2.6) This is Jeff's tf_read() support
patch for AHCI.

Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

02_libata_ahci-err-int.patch
: AHCI error handling fix

(included in libata-2.6) Fixes AHCI bits during handling of
fatal error int.

03_libata_update_desc_code.patch
: update ATA PT sense desc code

Change the ATA pass through sense block descriptor code to
0x09 per SAT

04_libata_control_pg_desc_bit.patch
: support descriptor sense in ctrl page

libata must support the descriptor format sense blocks as they
are required to properly report results of ATA pass through
commands as well as other SCSI commands reporting 48b LBAs.
This patch adjusts the control mode page to properly report
this.

05_libata_split_ata_to_sense_error.patch
: rework how CCs generated

This patch fixes several bugs as well as reorganizes the way
check conditions are generated.  Bugs fixed: 1) in
ata_scsi_qc_complete(), ATA_12/16 commands wouldn't call
ata_pass_thru_cc() on error status; 2) ata_pass_thru_cc()
wouldn't put the SK, ASC, and ASCQ from ata_to_sense_error()
in the correct place in the sense block because
ata_to_sense_error() was writing a fixed sense block.

Per the recommendations in the comments, ata_to_sense_error()
is now split into 3 parts.  The existing fcn is only used for
outputting a sense key/ASC/ASCQ triplicate.  A new function
ata_dump_status() was created to print the error info, similar
to the ide variety.  A third function ata_gen_fixed_sense()
was created to generate a fixed length sense block.  I added
the use of the info field for 28b LBAs only.
ata_pass_thru_cc() renamed to ata_gen_ata_desc_sense() to
match naming convention, presumably to include another
descriptor format function in the future (see question 2
below).

Questions:

1) I made the ata_gen_..._sense() routines read the status
   register themselves rather than use the drv_stat values
   that used to be passed in?  These values seemed
   unreliable/useless since they were often hard coded (see
   calls to ata_qc_complete() for origins of most drv_stat
   variables).  Sound ok?

2) the SAT spec has little about error handling and sense
   information, sepcifically what descriptor format is valid
   for use by SAT commands.  I want to use descriptor type 00
   (information) in my next patch until a spec says
   differently.  Sound ok?

[ End of patch descriptions ]

BR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Awful long timeouts for flash-file-system

2005-03-17 Thread Voluspa
On Thu, 17 Mar 2005 05:06:23 +0100 Voluspa wrote:



Went back to 2.6.10 and just got one of those dma_timer_expiry freezes.
Seems the disk is on the blink then. Sorry about the noise.

Mvh
Mats Johannesson
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Memory Stick Changes in 2.6.11?

2005-03-17 Thread Bill Davidsen
I was pulling some pictures out of memory sticks from a camera, and 
after I pulled tham I was removing the image files from the stick. One 
of the sticks mounted read-only. After a few attempts to explicitly use 
"rw" in the mount command and things like that, I booted back into 
2.6.10 and found the stick mounted rw.

I looked at the code, and I don't see anything obvious. Can someone 
point me to where the change is made?

OT: I think that if I explicitly use the rw option the mount should do 
what I ask or fail. This "I can't do what you want so I did something 
else" behaviour make scripts more complex.

--
   -bill davidsen ([EMAIL PROTECTED])
"The secret to procrastination is to put things off until the
 last possible moment - but no longer"  -me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KGDB question

2005-03-17 Thread Jesse Barnes
On Thursday, March 17, 2005 1:54 pm, Andrew Morton wrote:
> "Abhinkar, Sameer" <[EMAIL PROTECTED]> wrote:
> > Are there any patches or hooks
> > available to enable KGDB for linux-2.6.11.2?
>
> kgdb patches are maintained in -mm kernels.
>
> Patches are in
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11
>-mm1/broken-out/*kgdb*
>
> And the patch application order is described in
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11
>-mm1/patch-series -

What's the latest status on these?  Last I heard, some cleanup was going to 
happen to make kgdb suitable for the mainline, did that ever happen?  Also, 
it would be nice if I could connect to a remote kernel running the kgdb stubs 
w/o having to run gdb on the same ethernet segment.  Would that be difficult 
to fix?

Thanks,
Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Prezeroing V8

2005-03-17 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> Adds management of ZEROED and NOT_ZEROED pages and a background daemon
> called scrubd. /proc/sys/vm/scrubd_load, /proc/sys/vm_scrubd_start and
> /proc/sys/vm_scrubd_stop control the scrub daemon. See Documentation/vm/
> scrubd.txt

It's hard to know what to think about this without benchmarking numbers.

It would help if you could briefly describe the implementation and design
decisions when sending patches.

For example, one area where we could use this is in pagetable management,
where we need zeroed pages and we tend to free up known-to-be-zero and
probably cache-warm pages.  Right now some architectures are maintaining
their own quicklists, or using a slab cache, both of which are suboptimal.

But afaict the patch doesn't differentiate between cache-cold and cache-hot
zeroed pages, and doesn't have an API with which clients can free up a
known-to-be-zero page.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/2] fork_connector: add a fork connector

2005-03-17 Thread Jesse Barnes
On Thursday, March 17, 2005 1:38 pm, Evgeniy Polyakov wrote:
> The most significant part there - is requirement to store
> u32 seq in each CPU's cache and thus flush cacheline +
> invalidate/get from mem on each other cpus
> each time it is accessed, which is a big price.

Same thing has to happen with the lock.  To put it simply, writing global 
variables from multiple CPUs with anything other than very low frequency is 
bad.

> It is totally Guillaume's work - so he decides,
> I would recomend per cpu counters and processor's
> id in each message.
> And of course userspace should take care of misordered
> messages.
> I personally prefer such mechanism.

Yep, I agree.  Hopefully Guillaume will too :)

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KGDB question

2005-03-17 Thread Andrew Morton
"Abhinkar, Sameer" <[EMAIL PROTECTED]> wrote:
>
> Are there any patches or hooks
> available to enable KGDB for linux-2.6.11.2? 

kgdb patches are maintained in -mm kernels.

Patches are in 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm1/broken-out/*kgdb*

And the patch application order is described in

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm1/patch-series
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] NFS: add I/O performance counters

2005-03-17 Thread Andrew Morton
[EMAIL PROTECTED] (Chuck Lever) wrote:
>
> +static inline void nfs_inc_stats(struct inode *inode, unsigned int stat)
> +{
> + struct nfs_iostats *iostats = NFS_SERVER(inode)->io_stats;
> + iostats[smp_processor_id()].counts[stat]++;
> +}

The use of smp_processor_id() outside locks should spit a runtime warning. 
And it is racy: if you switch CPUs between the read and the write (via
preemption), the stats will be corrupted.

A preempt_disable()/enable() will fix those things up.

> +static inline struct nfs_iostats *nfs_alloc_iostats(void)
> +{
> + struct nfs_iostats *new;
> + new = kmalloc(sizeof(struct nfs_iostats) * NR_CPUS, GFP_KERNEL);
> + if (new)
> + memset(new, 0, sizeof(struct nfs_iostats) * NR_CPUS);
> + return new;
> +}
> +

You'd be better off using alloc_percpu() here, so each CPU's counter goes
into its node-local memory.

Or simply use .  AFACIT the warning at the top of
that file isn't true any more.  A 4-byte counter on a 32-way should consume
just a little over 256 bytes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >