[PATCH] SGI 926917: make knfsd interact cleanly with HSMs

2005-03-14 Thread Greg Banks
G'day,

The NFSv3 protocol specifies an error, NFS3ERR_JUKEBOX, which a server
should return when an I/O operation will take a very long time.
This causes a different pattern of retries in clients, and avoids
a number of serious problems associated with I/Os which take longer
than an RPC timeout.  The Linux knfsd server has code to generate the
jukebox error and many NFS clients are known to have working code to
handle it.

One scenario in which a server should emit the JUKEBOX error is when
a file data which the client is attempting to access is managed by
an HSM (Hierarchical Storage Manager) and is not present on the disk
and needs to be brought in from tape.  Due to the nature of tapes this
operation can take minutes rather than the milliseconds normally seen
for local file data.

Currently the Linux knfsd handles this situation poorly.  A READ NFS
call will cause the nfsd thread handling it to block until the file
is available, without sending a reply to the NFS client.  After a
few seconds the client retries, and this second READ call causes
another nfsd to block behind the first one.  A few seconds later and
the client's retries have blocked *all* the nfsd threads, and all NFS
service from the server stops until the original file arrives on disk.

WRITEs and SETATTRs which truncate the file are marginally better, in
that the knfsd dupcache will catch the retries and drop them without
blocking an nfsd (the dupcache *will* catch the retries because the
cache entry remains in RC_INPROG state and is not reused until the
first call finishes).  However the first call still blocks, so given
WRITEs to enough offline files the server can still be locked up.

There are also client-side implications, depending on the client
implementation.  For example, on a Linux client an RPC retry loop uses
an RPC request slot, so reads from enough separate offline files can
lock up a mountpoint.

This patch seeks to remedy the interaction between knfsd and HSMs by
providing mechanisms to allow knfsd to tell an underlying filesystem
(which supports HSMs) not to block for reads, writes and truncates
of offline files.  It's a port of a Linux 2.4 patch used in SGI's
ProPack distro for the last 12 months.  The patch:

*  provides a new ATTR_NO_BLOCK flag which the kernel can
   use to tell a filesystem's inode_ops->setattr() operation not
   to block when truncating an offline file.  XFS already obeys
   this flag (inside a #ifdef)
   
*  changes knfsd to provide ATTR_NO_BLOCK when it does the VFS
   calls to implement the SETATTR NFS call.

*  changes knfsd to supply the O_NONBLOCK flag in the temporary
   struct file it uses for VFS reads and writes, in order to ask
   the filesystem not to block when reading or writing an offline
   file.  XFS already obeys this new semantic for O_NONBLOCK
   (and in SLES9 so does JFS).

*  adds code to translate the -EAGAIN the filesystem returns when
   it would have blocked, to the -ETIMEDOUT that knfsd expects.


Signed-off-by: Greg Banks <[EMAIL PROTECTED]>
---
 fs/nfsd/vfs.c  |   33 +++--
 include/linux/fs.h |1 +
 2 files changed, 32 insertions(+), 2 deletions(-)


Index: linux/fs/nfsd/vfs.c
===
--- linux.orig/fs/nfsd/vfs.c2005-03-07 13:13:57.0 +1100
+++ linux/fs/nfsd/vfs.c 2005-03-07 14:01:52.0 +1100
@@ -311,6 +311,16 @@ nfsd_setattr(struct svc_rqst *rqstp, str
goto out_nfserr;
}
DQUOT_INIT(inode);
+
+
+   /*
+* Tell a Hierarchical Storage Manager (e.g. via DMAPI) to
+* return EAGAIN when an action would take minutes instead of
+* milliseconds so that NFS can reply to the client with
+* NFSERR_JUKEBOX instead of blocking an nfsd thread.
+*/
+   if (rqstp->rq_vers == 3)
+   iap->ia_valid |= ATTR_NO_BLOCK;
}
 
imode = inode->i_mode;
@@ -333,6 +343,9 @@ nfsd_setattr(struct svc_rqst *rqstp, str
if (!check_guard || guardtime == inode->i_ctime.tv_sec) {
fh_lock(fhp);
err = notify_change(dentry, iap);
+   /* to get NFSERR_JUKEBOX on the wire, need -ETIMEDOUT */
+   if (err == -EAGAIN)
+   err = -ETIMEDOUT;
err = nfserrno(err);
fh_unlock(fhp);
}
@@ -671,6 +684,10 @@ nfsd_read(struct svc_rqst *rqstp, struct
if (ra)
file.f_ra = ra->p_ra;
 
+   /* Support HSMs -- see comment in nfsd_setattr() */
+   if (rqstp->rq_vers == 3)
+   file.f_flags |= O_NONBLOCK;
+
if (file.f_op->sendfile) {
svc_pushback_unused_pages(rqstp);
err = file.f_op->sendfile(, , *count,
@@ -694,8 +711,12 @@ nfsd_read(struct svc_rqst *rqstp, struct
*count = err;
err = 0;

Re: [PATCH] Per cpu irq stat

2005-03-14 Thread Christoph Lameter
On Mon, 14 Mar 2005, Andrew Morton wrote:

> >  +DEFINE_PER_CPU(irq_cpustat_t, irq_stat)
> >  cacheline_maxaligned_in_smp;
>
> Why is this marked cacheline_maxaligned_in_smp?

In order to avoid potential false aliasing I guess. irq_cpustat_t is
already marked ___cacheline_aligned though which should be sufficient.
Shai?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: bad pgd/pmd in latest BK on ia64

2005-03-14 Thread Benjamin Herrenschmidt
On Mon, 2005-03-14 at 15:31 -0800, David S. Miller wrote:
> On Mon, 14 Mar 2005 15:11:42 -0800
> "David S. Miller" <[EMAIL PROTECTED]> wrote:
> 
> > I therefore suspect the pgwalk patches.
> 
> I just noticed something else while reviewing this stuff.
> The PTRS_PER_PMD macros aren't used anymore, so my hacks
> to get 32-bit process VM operations optimized on sparc64
> aren't even being used any more, ho hum... :-)  There are
> better ways to do this.
> 
> (For the interested, see {REAL_}PTRS_PER_PMD in
>  include/asm-sparc64/pgtable.h)
> 
> Come to think of it, this may be related somehow to whatever
> is causing the problems.

That reminds me ... I still itend to toy with your old patches and add
some more abstract walkers & bitmap stuffs. Just no time at the moment. 

The main thing I want to change from your approach is instead of calling
a pte_work callback for every pte, call it for ranges of PTEs (that is
PTE pages most of the time). The goal here is to avoid the overhead of
the indirect function call (& additional stackframe junk etc...) on
every single PTE.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] reduce __deprecated spew

2005-03-14 Thread Arjan van de Ven

> (The intermodule_register and pm_register stuff has been hanging around for
> so long that one wonders if we need sterner stimuli, not lesser).

intermodule can just about go (one user left).. we could start by making
the intermodule.c file only build when that one user is selected (that
user is a corner case) to avoid others from accidentally starting to use
it again ...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: 2.6.11-rc5-mm1: reiser4 eating cpu time

2005-03-14 Thread Denis Vlasenko
On Sunday 13 March 2005 15:24, Alexander Gran wrote:
> Hi, 
> 
> Well, of course it cannot handle that large files (I wouldn't expect that, 
> either). My Problem is that when I open the file, it's not just kwrite but 
> other processes that need so much cpu time. That kwrite is eating cpu is ok. 
> I cannot reproduce the behaviour for some reason however. 
> So for short what's now (2.6.11-mm3) hapening:
> I open a file of 150MB with kwrite. Kwrite start using all cpu it can get
> After some seconds pdflush kicks in. Kwrite seems to wait, and pdflush is 
> eating cpu cyles. These 2 alternate for some time, until file is loaded. 

I bet kwrite does something silly.

Use strace -tt to find out whether kwrite spends that much CPU
by doing zillions of syscalls or not. 
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Per cpu irq stat

2005-03-14 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> The definition of the irq_stat as an array means that the individual
>  elements of the irq_stat array are located on one NUMA node requiring
>  internode traffic to access irq_stat from other nodes. This patch makes
>  irq_stat a per_cpu variable which allows most accesses to be local.

OK...

The wordwrapping monster got at your patch, but I fixed it up.

>  +DEFINE_PER_CPU(irq_cpustat_t, irq_stat)
>  cacheline_maxaligned_in_smp;

Why is this marked cacheline_maxaligned_in_smp?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] reduce __deprecated spew

2005-03-14 Thread Andrew Morton
Matt Mackall <[EMAIL PROTECTED]> wrote:
>
>  This patch changes a couple of the noisier deprecations to only warn
>  on the primary entrypoint (in these cases, the _register functions).
>  This approach makes it obvious that an interface is going away while
>  only warning once per user. I suggest we adopt this approach for
>  future deprecation campaigns.

But that's going to warn when the deprecated function itself is compiled,
isn't it?

If so, that's backwards.  We want to warn when the deprecated function is
_used_, so people go fix up their code, and we can then remove the
deprecated function.

(The intermodule_register and pm_register stuff has been hanging around for
so long that one wonders if we need sterner stimuli, not lesser).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ES7000 Legacy Mappings Update

2005-03-14 Thread Andrey Panin
On 073, 03 14, 2005 at 06:05:54PM -0800, Andrew Morton wrote:
> 
> You triggered my trivia twitch.
> 
> Jason Davis <[EMAIL PROTECTED]> wrote:
> >
> >  -   * ES7000 has no legacy identity mappings
> >  +   * Older generations of ES7000 have no legacy identity mappings
> >  */
> >  -  if (es7000_plat)
> >  +  if (es7000_plat && es7000_plat < 2) 
> > return;
> 
> Why not
> 
>   if (es7000_plat == 1)
> 
> ?
> 
> > /* 
> >  diff -Naurp linux-2.6.11.3/arch/i386/mach-es7000/es7000plat.c 
> > linux-2.6.11.3-legacy/arch/i386/mach-es7000/es7000plat.c
> >  --- linux-2.6.11.3/arch/i386/mach-es7000/es7000plat.c  2005-03-13 
> > 01:44:41.0 -0500
> >  +++ linux-2.6.11.3-legacy/arch/i386/mach-es7000/es7000plat.c   
> > 2005-03-14 11:52:44.0 -0500
> >  @@ -138,7 +138,14 @@ parse_unisys_oem (char *oemptr, int oem_
> > es7000_plat = 0;
> > } else {
> > printk("\nEnabling ES7000 specific features...\n");
> >  -  es7000_plat = 1;
> >  +  /*
> >  +   * Check to see if this is a x86_64 ES7000 machine.
> >  +   */
> >  +  if (!(boot_cpu_data.x86 <= 15 && boot_cpu_data.x86_model <= 2))
> >  +  es7000_plat = 2;
> >  +  else
> >  +  es7000_plat = 1;
> >  +
> 
> Perhaps some nice enumerated identifiers here, rather than magic numbers?

While you are looking at this code can you take a look at the attached
trivial patch ?

-- 
Andrey Panin| Linux and UNIX system administrator
[EMAIL PROTECTED]   | PGP key: wwwkeys.pgp.net

This patch moves es7000_plat global variable out of DMI code.

Signed-off-by: Andrey Panin <[EMAIL PROTECTED]>

 arch/i386/kernel/dmi_scan.c |2 --
 arch/i386/kernel/mpparse.c  |1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff -urdpNX /usr/share/dontdiff 
linux-2.6.11.vanilla/arch/i386/kernel/dmi_scan.c 
linux-2.6.11/arch/i386/kernel/dmi_scan.c
--- linux-2.6.11.vanilla/arch/i386/kernel/dmi_scan.c2005-03-08 
18:02:00.0 +0300
+++ linux-2.6.11/arch/i386/kernel/dmi_scan.c2005-03-08 18:04:38.0 
+0300
@@ -12,8 +12,6 @@
 #include 
 
 
-int es7000_plat = 0;
-
 struct dmi_header
 {
u8  type;
diff -urdpNX /usr/share/dontdiff 
linux-2.6.11.vanilla/arch/i386/kernel/mpparse.c 
linux-2.6.11/arch/i386/kernel/mpparse.c
--- linux-2.6.11.vanilla/arch/i386/kernel/mpparse.c 2005-03-02 
10:37:53.0 +0300
+++ linux-2.6.11/arch/i386/kernel/mpparse.c 2005-03-08 18:05:28.0 
+0300
@@ -982,6 +982,7 @@ void __init mp_override_legacy_irq (
return;
 }
 
+int es7000_plat;
 
 void __init mp_config_acpi_legacy_irqs (void)
 {


[PATCH] reduce __deprecated spew

2005-03-14 Thread Matt Mackall
This patch changes a couple of the noisier deprecations to only warn
on the primary entrypoint (in these cases, the _register functions).
This approach makes it obvious that an interface is going away while
only warning once per user. I suggest we adopt this approach for
future deprecation campaigns.

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: bk/include/linux/pm.h
===
--- bk.orig/include/linux/pm.h  2005-03-14 22:14:59.0 -0800
+++ bk/include/linux/pm.h   2005-03-14 22:17:48.0 -0800
@@ -108,17 +108,17 @@ struct pm_dev __deprecated *pm_register(
 /*
  * Unregister a device with power management
  */
-void __deprecated pm_unregister(struct pm_dev *dev);
+void /*deprecated*/ pm_unregister(struct pm_dev *dev);
 
 /*
  * Unregister all devices with matching callback
  */
-void __deprecated pm_unregister_all(pm_callback callback);
+void /*deprecated*/ pm_unregister_all(pm_callback callback);
 
 /*
  * Send a request to all devices
  */
-int __deprecated pm_send_all(pm_request_t rqst, void *data);
+int /*deprecated*/ pm_send_all(pm_request_t rqst, void *data);
 
 #else /* CONFIG_PM */
 
Index: bk/include/linux/module.h
===
--- bk.orig/include/linux/module.h  2005-03-14 22:14:59.0 -0800
+++ bk/include/linux/module.h   2005-03-14 22:17:50.0 -0800
@@ -562,9 +562,9 @@ __MODULE_PARM_TYPE(var, type);
 #define HAVE_INTER_MODULE
 extern void __deprecated inter_module_register(const char *,
struct module *, const void *);
-extern void __deprecated inter_module_unregister(const char *);
-extern const void * __deprecated inter_module_get_request(const char *,
+extern void /*deprecated*/ inter_module_unregister(const char *);
+extern const void * /*deprecated*/ inter_module_get_request(const char *,
const char *);
-extern void __deprecated inter_module_put(const char *);
+extern void /*deprecated*/ inter_module_put(const char *);
 
 #endif /* _LINUX_MODULE_H */


-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Per cpu irq stat

2005-03-14 Thread Christoph Lameter
The definition of the irq_stat as an array means that the individual
elements of the irq_stat array are located on one NUMA node requiring
internode traffic to access irq_stat from other nodes. This patch makes
irq_stat a per_cpu variable which allows most accesses to be local.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Shai Fultheim <[EMAIL PROTECTED]>

Index: linux-2.6.11/arch/i386/kernel/apic.c
===
--- linux-2.6.11.orig/arch/i386/kernel/apic.c   2005-03-01
23:38:33.0 -0800
+++ linux-2.6.11/arch/i386/kernel/apic.c2005-03-08
15:01:43.0 -0800
@@ -1165,7 +1165,7 @@ fastcall void smp_apic_timer_interrupt(s
/*
 * the NMI deadlock-detector uses this.
 */
-   irq_stat[cpu].apic_timer_irqs++;
+   per_cpu(irq_stat, cpu).apic_timer_irqs++;

/*
 * NOTE! We'd better ACK the irq immediately,
Index: linux-2.6.11/arch/i386/kernel/io_apic.c
===
--- linux-2.6.11.orig/arch/i386/kernel/io_apic.c2005-03-01
23:37:54.0 -0800
+++ linux-2.6.11/arch/i386/kernel/io_apic.c 2005-03-08
16:56:24.923078776 -0800
@@ -275,7 +275,7 @@ struct irq_cpu_info {
 #define IRQ_DELTA(cpu,irq) (irq_cpu_data[cpu].irq_delta[irq])

 #define IDLE_ENOUGH(cpu,now) \
-   (idle_cpu(cpu) && ((now) -
irq_stat[(cpu)].idle_timestamp > 1))
+   (idle_cpu(cpu) && ((now) - per_cpu(irq_stat,
(cpu)).idle_timestamp > 1))

 #define IRQ_ALLOWED(cpu, allowed_mask) cpu_isset(cpu, allowed_mask)

Index: linux-2.6.11/arch/i386/kernel/irq.c
===
--- linux-2.6.11.orig/arch/i386/kernel/irq.c2005-03-01
23:37:48.0 -0800
+++ linux-2.6.11/arch/i386/kernel/irq.c 2005-03-08 17:57:13.623392016
-0800
@@ -16,6 +16,9 @@
 #include 
 #include 

+DEFINE_PER_CPU(irq_cpustat_t, irq_stat)
cacheline_maxaligned_in_smp;
+EXPORT_PER_CPU_SYMBOL(irq_stat);
+
 #ifndef CONFIG_X86_LOCAL_APIC
 /*
  * 'what should we do if we get a hw irq event on an illegal vector'.
@@ -246,7 +249,7 @@ skip:
for (j = 0; j < NR_CPUS; j++)
if (cpu_online(j))
seq_printf(p, "%10u ",
-   irq_stat[j].apic_timer_irqs);
+
per_cpu(irq_stat,j).apic_timer_irqs);
seq_putc(p, '\n');
 #endif
seq_printf(p, "ERR: %10u\n",
atomic_read(_err_count));
Index: linux-2.6.11/arch/i386/kernel/nmi.c
===
--- linux-2.6.11.orig/arch/i386/kernel/nmi.c2005-03-01
23:38:10.0 -0800
+++ linux-2.6.11/arch/i386/kernel/nmi.c 2005-03-08 15:01:43.0
-0800
@@ -110,7 +110,7 @@ int __init check_nmi_watchdog (void)
printk(KERN_INFO "testing NMI watchdog ... ");

for (cpu = 0; cpu < NR_CPUS; cpu++)
-   prev_nmi_count[cpu] = irq_stat[cpu].__nmi_count;
+   prev_nmi_count[cpu] = per_cpu(irq_stat,
cpu).__nmi_count;
local_irq_enable();
mdelay((10*1000)/nmi_hz); // wait 10 ticks

@@ -483,7 +483,7 @@ void nmi_watchdog_tick (struct pt_regs *
 */
int sum, cpu = smp_processor_id();

-   sum = irq_stat[cpu].apic_timer_irqs;
+   sum = per_cpu(irq_stat, cpu).apic_timer_irqs;

if (last_irq_sums[cpu] == sum) {
/*
Index: linux-2.6.11/arch/i386/kernel/process.c
===
--- linux-2.6.11.orig/arch/i386/kernel/process.c2005-03-08
15:01:42.0 -0800
+++ linux-2.6.11/arch/i386/kernel/process.c 2005-03-08
18:06:03.695808760 -0800
@@ -161,7 +161,7 @@ void cpu_idle (void)
if (!idle)
idle = default_idle;

-   irq_stat[cpu].idle_timestamp = jiffies;
+   __get_cpu_var(irq_stat).idle_timestamp =
jiffies;
idle();
}
schedule();
Index: linux-2.6.11/include/asm-i386/hardirq.h
===
--- linux-2.6.11.orig/include/asm-i386/hardirq.h2005-03-01
23:38:17.0 -0800
+++ linux-2.6.11/include/asm-i386/hardirq.h 2005-03-08
18:10:52.545896872 -0800
@@ -12,8 +12,13 @@ typedef struct {
unsigned int apic_timer_irqs;   /* arch dependent */
 } cacheline_aligned irq_cpustat_t;

-#include  /* Standard mappings for irq_cpustat_t
above */
+DECLARE_PER_CPU(irq_cpustat_t, irq_stat);
+extern irq_cpustat_t irq_stat[];
+
+#define __ARCH_IRQ_STAT
+#define __IRQ_STAT(cpu, member) (per_cpu(irq_stat, cpu).member)

 void ack_bad_irq(unsigned int irq);
+#include 

 #endif /* __ASM_HARDIRQ_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

[Announce] sg3_utils-1.13 available

2005-03-14 Thread Douglas Gilbert
sg3_utils is a package of command line utilities for sending
SCSI commands to devices. This package targets the lk 2.6 and
lk 2.4 series. In the lk 2.6 series these utilities (except
sgp_dd) can be used with any devices that support the SG_IO
ioctl.
This version adds sg_format which can format SCSI disks,
potentially with a different number of bytes in each block.
It can also resize (sometimes called "short stroke") a
disk. There are also extensions to the sg_dd utility to
use the READ LONG SCSI command on damaged blocks (for a "last
resort" media copy).
A tarball, rpm and deb can be found on:
http://www.torque.net/sg .
For an overview of sg3_utils see this page:
http://www.torque.net/sg/u_index.html
The sg_dd utility has its own page at:
http://www.torque.net/sg/sg_dd.html
A changelog can be found at:
http://www.torque.net/sg/p/sg3_utils.CHANGELOG
A release announcement has been sent to freshmeat.net .
Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE][RFC] PlugSched-3.0.2 for 2.6.11-mm3 (includes nicksched)

2005-03-14 Thread Peter Williams
A patch of PlugSched-3.0.2 (containing ingosched, staircase,
spa_no_frills, zaphod and nicksched CPU schedulers) against a 2.6.11-mm3 
kernel is available for download from:


Peter
--
Peter Williams   [EMAIL PROTECTED]
"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SMbus not enabled

2005-03-14 Thread Andrew Morton
"Enrico Bartky" <[EMAIL PROTECTED]> wrote:
>
> my notebook have a SiS 964 Chipset and "quirked" by "quirk_sis_503", ...
> but there is no SMbus device. If I add a call to the "quirk_sis_96x_smbus"
> function directly from the "quirk_sis_503" function, the smbus is present,
> but I think a call to a quirk from a quirk is not optimal. Is there a better
> solution?

(Please wrap your email lines before column 80)

What version of the kernel are you using?

I assume that you mean that the machine does have SMBus, but that it is not
being recognised by the kernel?

It could be that we don't have the appropriate PCI IDs in there.  Please
run `lspci -vvxx' and send the part which is relevant to the SMBus
interface.

Also, in drivers/pci/quirks.c you can change `#undef DEBUG' to `#define
DEBUG' and it will print useful information.

This patch will help, too:

--- 25/drivers/pci/quirks.c~a   2005-03-14 22:23:08.0 -0800
+++ 25-akpm/drivers/pci/quirks.c2005-03-14 22:23:57.0 -0800
@@ -1262,6 +1262,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN
 static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f, struct 
pci_fixup *end)
 {
while (f < end) {
+   pr_debug(PCI: quirks: inspecting %04x:%04x\n",
+   dev->vendor, dev->device);
if ((f->vendor == dev->vendor || f->vendor == (u16) PCI_ANY_ID) 
&&
(f->device == dev->device || f->device == (u16) 
PCI_ANY_ID)) {
pr_debug("PCI: Calling quirk %p for %s\n", f->hook, 
pci_name(dev));
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

2005-03-14 Thread Christoph Lameter
On Fri, 11 Mar 2005, john stultz wrote:

> +/* cyc2ns():
> + *   Uses the timesource and ntp ajdustment interval to
> + *   convert cycle_ts to nanoseconds.
> + *   If rem is not null, it stores the remainder of the
> + *   calculation there.
> + *
> + */

This function is called in critical paths and it would be very important
to optimize it further.

> +static inline nsec_t cyc2ns(struct timesource_t* ts, int ntp_adj, cycle_t 
> cycles, cycle_t* rem)
> +{
> + u64 ret;
> + ret = (u64)cycles;
> + ret *= (ts->mult + ntp_adj);

This only changes when nt_adj changes. Maybe maintain the sum separately?

> + if (unlikely(rem)) {
> + /* XXX clean this up later!
> +  *  buf for now relax, we only calc
> +  *  remainders at interrupt time
> +  */
> + u64 remainder = ret & ((1 << ts->shift) -1);
> + do_div(remainder, ts->mult);
> + *rem = remainder;

IA64 does not do remainder processing (maybe I just do not understand
this...) but this seems to be not necessay if one uses 64 bit values that
are properly shifted?

> + }
> + ret >>= ts->shift;
> + return (nsec_t)ret;
> +}

The whole function could simply be:

#define cyc2ns(cycles, ts) (cycles*ts->current_factor) >> ts->shift
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm counter operations through macros

2005-03-14 Thread Christoph Lameter
On Mon, 14 Mar 2005, Andrew Morton wrote:

> >  Then you wont be able to get rid of the counters by
> >
> >  #define MM_COUNTER(xx)
> >
> >  anymore.
>
> Why would we want to do that?

If counters are calculated on demand then no counter is
necessary.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ACPI] [PATCH, new ACPI driver] new sony_acpi driver

2005-03-14 Thread Yu, Luming
 
Basically, this driver just call some specific AML method for hotkey function, 
that can be 
achieved through generic hotkey driver filed at 
http://bugzilla.kernel.org/show_bug.cgi?id=3887.
So I don't think this driver is needed.

>-Original Message-
>From: [EMAIL PROTECTED] 
>[mailto:[EMAIL PROTECTED] On Behalf Of 
>Stelian Pop
>Sent: 2005年2月11日 0:18
>To: Linux Kernel Mailing List
>Cc: Andrew Morton; [EMAIL PROTECTED]
>Subject: [ACPI] [PATCH, new ACPI driver] new sony_acpi driver
>
>Hi,
>
>This driver has been submitted (almost unchanged) on lkml and 
>on acpi-devel twice, first on July 21, 2004, then again on
>September 17, 2004. It has been quietly ignored.
>
>Privately I've had many positive feedbacks from users of this driver
>(and no negative feedback), including Linux distributions who wish
>to include it into their kernels. The reports are increasing in number,
>it would seem that newer Sony Vaios are more and more incompatible
>with sonypi and require sony_acpi to control the screen brightness.
>
>Please integrate this patch in -mm for wider testing and into
>the ACPI tree.
>
>Original announcement follows below.
>
>Thanks,
>
>Stelian.
>
>PS: I am also going to submit a bugzilla RFE for the acpi people,
>I have been told they are more receptive to that.
>
>--
>Most of the Sony Vaio owners are happy with the current sonypi
>driver, which makes them able to get/set the screen brightness,
>capture the jogdial and/or special key events etc.
>
>However, some newer Vaio series (FX series, and not only those) lack
>a SPIC device in their ACPI BIOS making the sonypi driver unusable
>for them.
>
>Fortunately, there is another ACPI device, called SNC (for Sony
>Notebook Control) which seems to be present in all Vaios, which
>can be used to access some low-level laptop functions. From what
>I understood, the SPIC device itself is built on top of SNC.
>
>The SNC device is able to drive the screen brightness, and probably
>more (what is does more is yet unknown). The attached driver is a
>first shot of using the SNC directly.
>
>In the default mode, the sony_acpi driver let's the user get/set the
>screen brightness, and only that.
>
>The screen is one of the most important power consumers in a laptop,
>so being able to set its brightness is very important for many users,
>making this driver useful even if it does only that.
>
>In the debug/developer mode (which can be activated with a module
>option), the driver let's the user see a few other knobs, whose
>effects is however unknown. Using the debug mode we may hopefully
>find what those knobs do and propose that extra functionalities in
>the future versions of the driver (if someone at Sony is listening,
>you know what we need from you...)
>
>This driver does not interact with the current sonypi driver, both
>drivers can be used at the same time.
>
>Signed-of-by: Stelian Pop <[EMAIL PROTECTED]>
>
>--- /dev/null  2005-02-10 10:35:32.824183288 +0100
>+++ linux-2.6-stelian/drivers/acpi/sony_acpi.c 2005-01-31 
>17:05:53.0 +0100
>@@ -0,0 +1,442 @@
>+/*
>+ * ACPI Sony Notebook Control Driver (SNC)
>+ *
>+ * Copyright (C) 2004 Stelian Pop <[EMAIL PROTECTED]>
>+ * 
>+ * Parts of this driver inspired from asus_acpi.c, which is 
>+ * Copyright (C) 2002, 2003, 2004 Julien Lerouge, Karol Kozimor
>+ *
>+ * This program is free software; you can redistribute it 
>and/or modify
>+ * it under the terms of the GNU General Public License as 
>published by
>+ * the Free Software Foundation; either version 2 of the License, or
>+ * (at your option) any later version.
>+ * 
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+ * GNU General Public License for more details.
>+ * 
>+ * You should have received a copy of the GNU General Public License
>+ * along with this program; if not, write to the Free Software
>+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
>+ *
>+ */
>+
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+
>+#define ACPI_SNC_CLASS"sony"
>+#define ACPI_SNC_HID  "SNY5001"
>+#define ACPI_SNC_DRIVER_NAME  "ACPI Sony Notebook Control Driver v0.1"
>+
>+MODULE_AUTHOR("Stelian Pop");
>+MODULE_DESCRIPTION(ACPI_SNC_DRIVER_NAME);
>+MODULE_LICENSE("GPL");
>+
>+static int debug = 0;
>+module_param(debug, int, 0);
>+MODULE_PARM_DESC(debug,"set this to 1 (and RTFM) if you want 
>to help the development of this driver");
>+
>+static int sony_acpi_add (struct acpi_device *device);
>+static int sony_acpi_remove (struct acpi_device *device, int type);
>+
>+static struct acpi_driver sony_acpi_driver = {
>+  name:   ACPI_SNC_DRIVER_NAME,
>+  class:  ACPI_SNC_CLASS,
>+  ids:ACPI_SNC_HID,
>+  ops:{
>+  add:sony_acpi_add,

Re: huge filesystems

2005-03-14 Thread Andreas Dilger
On Mar 14, 2005  21:37 -0700, jmerkey wrote:
> 1. Scaling issues with readdir() with huge numbers of files (not even 
> huge really. 87000 files in a dir takes a while
> for readdir() to return results). I average 2-3 million files per 
> directory on 2.6.9. It can take a up to a minute for
> readdir() to return from initial reading from on of these directories 
> with readdir() through the VFS.

Actually, unless I'm mistaken the problem is that "ls" (even when you
ask it not to sort entries) is doing readdir on the whole directory
before returning any results.  We see this with Lustre and very large
directories.  Run strace on "ls" and it is doing masses of readdirs, but
no output to stdout.  Lustre readdir works OK on directories up to 10M
files, but ls sucks.

$ strace ls /usr/lib 2>&1 > /dev/null
:
:
open("/usr/lib", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=57344, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
getdents64(3, /* 120 entries */, 4096)  = 4096
getdents64(3, /* 65 entries */, 4096)   = 2568
getdents64(3, /* 111 entries */, 4096)  = 4088
:
:
getdents64(3, /* 59 entries */, 4096)   = 2152
getdents64(3, /* 10 entries */, 4096)   = 496
getdents64(3, /* 0 entries */, 4096)= 0
close(3)= 0
write(1, "Acrobat5\nalchemist\nanaconda\nanac"..., 4096) = 4096
write(1, "libbonobo-2.a\nlibbonobo-2.so\nlib"..., 4096) = 4096
write(1, "ibgdbm.so\nlibgdbm.so.2\nlibgdbm.s"..., 4096) = 4096
write(1, "nica_qmxxx.la\nlibgphoto_konica_q"..., 4096) = 4096
write(1, ".so\nlibIDL-2.so.0\nlibIDL-2.so.0."..., 4096) = 4096
write(1, "libkpilot.so.0\nlibkpilot.so.0.0."..., 4096) = 4096
write(1, "ove.so.0\nlibospgrove.so.0.0.0\nli"..., 4096) = 4096
write(1, ".6\nlibsoundserver_idl.la\nlibsoun"..., 4096) = 4096
write(1, "lparse.so.0\nlibxmlparse.so.0.1.0"..., 1294) = 1294


> 6. fdisk does not support drives arger than 2TB, so I have to hack the 
> partition tables and fake out dsfs with 3TB abd 4TB drives created with
> RAID 0 controllers and hardware. This needs to get fixed.

Use a different partition format (e.g. EFI or devicemapper) or none at all.
That is better than just ignoring the whole thing and some user thinking
"gee, I have all this free space here, maybe I'll make another partition".

Cheers, Andreas
--
Andreas Dilger
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/



pgpdx7X9htBSO.pgp
Description: PGP signature


Re: [PATCH] mm counter operations through macros

2005-03-14 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> On Mon, 14 Mar 2005, Andrew Morton wrote:
> 
>  > I don't think the MM_COUNTER_T macro adds much, really.  How about this?
> 
>  Then you wont be able to get rid of the counters by
> 
>  #define MM_COUNTER(xx)
> 
>  anymore.

Why would we want to do that?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] net/802/fc.c: #if 0 fc_type_trans

2005-03-14 Thread David S. Miller
On Sun, 6 Mar 2005 21:57:54 +0100
Adrian Bunk <[EMAIL PROTECTED]> wrote:

> The only user of fc_type_trans (drivers/net/fc/iph5526.c) is BROKEN in 
> 2.6 and removed in -mm.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

That driver isn't in Linus's tree any longer either.  Just delete
the thing altogether instead of #if 0'ing it.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm counter operations through macros

2005-03-14 Thread Christoph Lameter
On Mon, 14 Mar 2005, Andrew Morton wrote:

> I don't think the MM_COUNTER_T macro adds much, really.  How about this?

Then you wont be able to get rid of the counters by

#define MM_COUNTER(xx)

anymore.

>
> --- 25/include/linux/sched.h~mm-counter-operations-through-macros-tidy
> 2005-03-14 21:43:00.0 -0800
> +++ 25-akpm/include/linux/sched.h 2005-03-14 21:43:00.0 -0800
> @@ -210,7 +210,6 @@ extern void arch_unmap_area_topdown(stru
>  #define inc_mm_counter(mm, member) (mm)->_##member++
>  #define dec_mm_counter(mm, member) (mm)->_##member--
>  typedef unsigned long mm_counter_t;
> -#define MM_COUNTER_T(member) mm_counter_t _##member
>
>  struct mm_struct {
>   struct vm_area_struct * mmap;   /* list of VMAs */
> @@ -241,8 +240,8 @@ struct mm_struct {
>   unsigned long exec_vm, stack_vm, reserved_vm, def_flags, nr_ptes;
>
>   /* Special counters protected by the page_table_lock */
> - MM_COUNTER_T(rss);
> - MM_COUNTER_T(anon_rss);
> + mm_counter_t _rss;
> + mm_counter_t _anon_rss;
>
>   unsigned long saved_auxv[42]; /* for /proc/PID/auxv */
>
> _
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fastboot] Re: Query: Kdump: Core Image ELF Format

2005-03-14 Thread Vivek Goyal
On Wed, 2005-03-09 at 23:56 -0700, Eric W. Biederman wrote:
> Vivek Goyal <[EMAIL PROTECTED]> writes:
> 
> > I want to fill the virtual addresses of linearly mapped region. That is
> > physical addresses from 0 to MAXMEM (896 MB) are mapped by kernel at
> > virtual addresses PAGE_OFFSET to (PAGE_OFFSET + MAXMEM). Values of
> > PAGE_OFFSET and MAXMEM are already known and hard-coded.
> 
> PAGE_OFFSET has a common value of 0xc000, on x86.  However
> that value is by no means fixed.  The 4G/4G split changes it
> as do some other patches floating around at the time.
> On x86-64 I don't know how stable those kinds of offsets are.

Agreed. Then how about, exporting this information to user space.
Probably through sysfs. May be the range of linearly mapped region can
be exported. (PAGE_OFFSET to (PAGE_OFFSET + x)).

>  
> > I think I used the terminology kernel virtual address and that is adding
> > to the confusion. Kernel virtual addresses are not necessarily linearly
> > mapped. What I meant was kernel logical addresses whose associated
> > physical addresses differ only by a constant offset.
> 
> I know what you meant.  I simply meant that things don't look that
> constant to me.  Especially in Linux where there are enough people
> to try most of the reasonable possibilities.
> 
> I don't even think it is a bad idea.  But I do think we have a different
> idea of what is constant.
> 
> Eric
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm counter operations through macros

2005-03-14 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
>  This patch extracts all the operations on counters protected by the
>  page table lock (currently rss and anon_rss) into definitions in
>  include/linux/sched.h. All rss operations are performed through
>  the following macros:

I don't think the MM_COUNTER_T macro adds much, really.  How about this?

--- 25/include/linux/sched.h~mm-counter-operations-through-macros-tidy  
2005-03-14 21:43:00.0 -0800
+++ 25-akpm/include/linux/sched.h   2005-03-14 21:43:00.0 -0800
@@ -210,7 +210,6 @@ extern void arch_unmap_area_topdown(stru
 #define inc_mm_counter(mm, member) (mm)->_##member++
 #define dec_mm_counter(mm, member) (mm)->_##member--
 typedef unsigned long mm_counter_t;
-#define MM_COUNTER_T(member) mm_counter_t _##member
 
 struct mm_struct {
struct vm_area_struct * mmap;   /* list of VMAs */
@@ -241,8 +240,8 @@ struct mm_struct {
unsigned long exec_vm, stack_vm, reserved_vm, def_flags, nr_ptes;
 
/* Special counters protected by the page_table_lock */
-   MM_COUNTER_T(rss);
-   MM_COUNTER_T(anon_rss);
+   mm_counter_t _rss;
+   mm_counter_t _anon_rss;
 
unsigned long saved_auxv[42]; /* for /proc/PID/auxv */
 
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fixes to mmtimer driver

2005-03-14 Thread Christoph Lameter
On Mon, 14 Mar 2005, Andrew Morton wrote:

> Which I fixed up as below.

Thanks.

> Please, we've pushed in 14MB of patches in 11 days - it's really important
> to make sure that we're working against the latest tree.

Wow.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netfilter ipt_hashlimit

2005-03-14 Thread David S. Miller
On Fri, 11 Mar 2005 23:05:11 +1100
Herbert Xu <[EMAIL PROTECTED]> wrote:

> Russell King <[EMAIL PROTECTED]> wrote:
> > With current-ish Linus 2.6 BK, I'm seeing this:
> > 
> > net/ipv4/netfilter/ipt_hashlimit.c:96: warning: type defaults to `int' in 
> > declaration of `DECLARE_LOCK'
> > net/ipv4/netfilter/ipt_hashlimit.c:96: warning: parameter names (without 
> > types) in function declaration
> > 
> > Looks like ipt_hashlimit.c is missing an include?
> 
> Indeed.  It should include lockhelp.h directly.
> 
> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Applied, thanks Herbert.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

2005-03-14 Thread Christoph Lameter
Note that similarities exist between the posix clock and the time sources.
Will all time sources be exportable as posix clocks?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-14 Thread Greg KH
On Mon, Mar 14, 2005 at 04:30:33PM +, Phillip Lougher wrote:
> +typedef unsigned int squashfs_block;
> +typedef long longsquashfs_inode;

Try using u32 and u64 instead.

> +typedef unsigned int squashfs_uid;

Why is this a typedef?

> +
> +typedef struct squashfs_super_block {

Don't typedef structures, it's not the kernel way.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fixes to mmtimer driver

2005-03-14 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> Fix the issue that the timer sometimes will not fire if the scheduled
>  time has already expired. Plus some simplifications and style changes.

Tosses a reject due to the itimer patches which went in last week.

***
*** 430,436 
if (n > 20)
return 1;
  
-   } while (mmtimer_setup(x->i, t->it_timer.expires));
  
return 0;
  }
--- 418,424 
if (n > 20)
return 1;
  
+   } while (!mmtimer_setup(x->i, t->it_timer.expires));
  
return 0;
  }

Which I fixed up as below.

Please, we've pushed in 14MB of patches in 11 days - it's really important
to make sure that we're working against the latest tree.


--- 25/drivers/char/mmtimer.c~fixes-to-mmtimer-driver   2005-03-14 
21:33:04.0 -0800
+++ 25-akpm/drivers/char/mmtimer.c  2005-03-14 21:33:45.0 -0800
@@ -71,11 +71,6 @@ static struct file_operations mmtimer_fo
 };
 
 /*
- * Comparators and their associated info.  Shub has
- * three comparison registers.
- */
-
-/*
  * We only have comparison registers RTC1-4 currently available per
  * node.  RTC0 is used by SAL.
  */
@@ -174,14 +169,10 @@ static void inline mmtimer_setup_int_2(u
  * This function must be called with interrupts disabled and preemption off
  * in order to insure that the setup succeeds in a deterministic time frame.
  * It will check if the interrupt setup succeeded.
- * mmtimer_setup will return the cycles that we were too late if the
- * initialization failed.
  */
 static int inline mmtimer_setup(int comparator, unsigned long expires)
 {
 
-   long diff;
-
switch (comparator) {
case 0:
mmtimer_setup_int_0(expires);
@@ -194,17 +185,14 @@ static int inline mmtimer_setup(int comp
break;
}
/* We might've missed our expiration time */
-diff = rtc_time() - expires;
-   if (diff > 0) {
-   if (mmtimer_int_pending(comparator)) {
-   /* We'll get an interrupt for this once we're done */
-return 0;
-   }
-   /* Looks like we missed it */
-   return diff;
-}
+   if (rtc_time() < expires)
+   return 1;
 
-   return 0;
+   /*
+* If an interrupt is already pending then its okay
+* if not then we failed
+*/
+   return mmtimer_int_pending(comparator);
 }
 
 static int inline mmtimer_disable_int(long nasid, int comparator)
@@ -430,7 +418,7 @@ static int inline reschedule_periodic_ti
if (n > 20)
return 1;
 
-   } while (mmtimer_setup(x->i, t->it.mmtimer.expires));
+   } while (!mmtimer_setup(x->i, t->it.mmtimer.expires));
 
return 0;
 }
@@ -594,9 +582,15 @@ static int sgi_timer_set(struct k_itimer
 
if (flags & TIMER_ABSTIME) {
struct timespec n;
+   unsigned long now;
 
getnstimeofday();
-   when -= timespec_to_ns(n);
+   now = timespec_to_ns(n);
+   if (when > now)
+   when -= now;
+   else
+   /* Fire the timer immediately */
+   when = 0;
}
 
/*
@@ -644,7 +638,7 @@ retry:
timr->it.mmtimer.expires = when;
 
if (period == 0) {
-   if (mmtimer_setup(i, when)) {
+   if (!mmtimer_setup(i, when)) {
mmtimer_disable_int(-1, i);
posix_timer_event(timr, 0);
timr->it.mmtimer.expires = 0;
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Exports to enable clock driver modules

2005-03-14 Thread Christoph Hellwig
On Mon, Mar 14, 2005 at 08:37:43PM -0800, Christoph Lameter wrote:
> The following exports are necessary to allow loadable modules to define
> new clocks. Without these the mmtimer driver cannot be build
> correctly as a module (there is another mmtimer specific fix necessary to
> get  it to build properly but that will be a separate patch):

I'd say just disallow modular mmtimer instead.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC]: DEBUG for PCI IO & MEM allocation

2005-03-14 Thread Andrew Morton
Prarit Bhargava <[EMAIL PROTECTED]> wrote:
>
>  I propose the following patch to add a compile time DEBUG option to 
>  kernel/resource.c that would help in analyzing problems in this area.  
>  It's a few simple lines of output in  __request_resource, 
>  __release_resource, __request_region, and __release_region .
> 

A sane enough requirement.

>   
>  +DEBUGP("%s: resource request at 0x%lx-0x%lx\n", __FUNCTION__, 
> new->start, new->end);

Shouldn't this also be printing the ->name of the new resource?

A lot of the statements which you're adding will look screwy in an 80-col
xterm.  Please wrap 'em.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Awful long timeouts for flash-file-system

2005-03-14 Thread Robert Hancock
linux-os wrote:
Hello IDE experts.
I am trying to use a SanDisk SDCFB-256, CFA DISK drive. This
is supposed to emulate an IDE drive and does (sort of). However,
upon boot, the boot-code keeps trying and trying and trying to
do SOMETHING that aparently isn't even necessary because the
virtual disk is accessible and can be written/read and I can
even boot from it.

hdb: max request size: 128KiB
hdb: 501760 sectors (256 MB) w/1KiB Cache, CHS=980/16/32, DMA
hdb: cache flushes not supported
 hdb:<4>hdb: dma_timer_expiry: dma status == 0x61
I'm assuming you're using a CF-to-IDE adapter to hook up the card. Most 
likely your CompactFlash card is indicating that it supports DMA and the 
kernel is trying to use it. However, many CF-to-IDE adapters don't hook 
up the DMA control lines properly so the requests all time out until the 
 kernel gives up using DMA.

We use some Mesa Electronics CF-IDE adapters at work - some of the newer 
ones have some jumpers with positions NOR and DMA, DMA works if the 
jumpers are set to the DMA position. I don't think we've tried using any 
DMA-supporting CF cards on the older ones without these jumpers.

If the adapter you're using doesn't do DMA, I believe that if you use 
options like hdb=nodma or ide1=nodma, etc. that will get the kernel to 
not try and use it.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-mm3: SIS5513 DMA problem (set_drive_speed_status)

2005-03-14 Thread Andrew Morton
Martin Zwickel <[EMAIL PROTECTED]> wrote:
>
> Hi,
> 
> just tried the 2.6.11-mm3 and at boot-time my start scripts try to
> enable DMA on my disk (hdparm -m16 -c1 -u1 -X69 /dev/hda).
> 
> But while running hdparm, the kernel waits many seconds and gives me
> some DMA warnings/errors:
>
> ...
>
> hda: set_drive_speed_status: status=0xd0 { Busy }
> 
> ide: failed opcode was: unknown
> hda: dma_timer_expiry: dma status == 0x41
> hda: DMA timeout error
> hda: dma timeout error: status=0xd0 { Busy }
> ...
> 
> That happened also with 2.6.11-rc3 since I thought I should switch away
> from my 2.6.8-rc2-mm1 (the best kernel ever ;)).

Could you please check whether 2.6.11-rc1 does this?  It should be released
mid-week.  Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [TTY] overrun notify issue during 5 minutes after booting

2005-03-14 Thread Andrew Morton
moreau francis <[EMAIL PROTECTED]> wrote:
>
> By the way, is it safe in "n_tty_receive_overrun" to
>  call
>  "printk" ? because the former can be called from IT
>  context...

yup.  printk() is safe from all contexts except NMI.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MAC address instead of IP

2005-03-14 Thread Ben Greear
Donald Duckie wrote:
Hi!
I am looking for some sample codes which uses MAC
address instead of TCP-IP for data transmission. Any
suggestions are highly appreciated.
Check out the 'man 7 socket' man page and read up on
raw packet sockets.  You can format a packet down to
the ethernet header and send it directly to the
interface transmit queue...
And all this safely from user-space.
Ben
--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fixes to mmtimer driver

2005-03-14 Thread Christoph Lameter
Hmm. this somehow did not make it to akpm and lkml when I posted it a
week ago.

Fix the issue that the timer sometimes will not fire if the scheduled
time has already expired. Plus some simplifications and style changes.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Dimitri Sivanich <[EMAIL PROTECTED]>

Index: linux-2.6.11/drivers/char/mmtimer.c
===
--- linux-2.6.11.orig/drivers/char/mmtimer.c2005-03-01 23:38:13.0 
-0800
+++ linux-2.6.11/drivers/char/mmtimer.c 2005-03-09 09:48:40.0 -0800
@@ -71,11 +71,6 @@ static struct file_operations mmtimer_fo
 };

 /*
- * Comparators and their associated info.  Shub has
- * three comparison registers.
- */
-
-/*
  * We only have comparison registers RTC1-4 currently available per
  * node.  RTC0 is used by SAL.
  */
@@ -174,14 +169,10 @@ static void inline mmtimer_setup_int_2(u
  * This function must be called with interrupts disabled and preemption off
  * in order to insure that the setup succeeds in a deterministic time frame.
  * It will check if the interrupt setup succeeded.
- * mmtimer_setup will return the cycles that we were too late if the
- * initialization failed.
  */
 static int inline mmtimer_setup(int comparator, unsigned long expires)
 {

-   long diff;
-
switch (comparator) {
case 0:
mmtimer_setup_int_0(expires);
@@ -194,17 +185,14 @@ static int inline mmtimer_setup(int comp
break;
}
/* We might've missed our expiration time */
-diff = rtc_time() - expires;
-   if (diff > 0) {
-   if (mmtimer_int_pending(comparator)) {
-   /* We'll get an interrupt for this once we're done */
-return 0;
-   }
-   /* Looks like we missed it */
-   return diff;
-}
+   if (rtc_time() < expires)
+   return 1;

-   return 0;
+   /*
+* If an interrupt is already pending then its okay
+* if not then we failed
+*/
+   return mmtimer_int_pending(comparator);
 }

 static int inline mmtimer_disable_int(long nasid, int comparator)
@@ -430,7 +418,7 @@ static int inline reschedule_periodic_ti
if (n > 20)
return 1;

-   } while (mmtimer_setup(x->i, t->it_timer.expires));
+   } while (!mmtimer_setup(x->i, t->it_timer.expires));

return 0;
 }
@@ -594,9 +582,15 @@ static int sgi_timer_set(struct k_itimer

if (flags & TIMER_ABSTIME) {
struct timespec n;
+   unsigned long now;

getnstimeofday();
-   when -= timespec_to_ns(n);
+   now = timespec_to_ns(n);
+   if (when > now)
+   when -= now;
+   else
+   /* Fire the timer immediately */
+   when = 0;
}

/*
@@ -644,7 +638,7 @@ retry:
timr->it_timer.expires = when;

if (period == 0) {
-   if (mmtimer_setup(i, when)) {
+   if (!mmtimer_setup(i, when)) {
mmtimer_disable_int(-1, i);
posix_timer_event(timr, 0);
timr->it_timer.expires = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


IA32 (2.6.11 - 2005-03-14.16.00) - 2 New warnings

2005-03-14 Thread John Cherry
drivers/char/drm/i915_dma.c:573: warning: `verify_area' is deprecated (declared 
at include/asm/uaccess.h:105)
drivers/char/drm/i915_dma.c:603: warning: `verify_area' is deprecated (declared 
at include/asm/uaccess.h:105)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm counter operations through macros

2005-03-14 Thread Christoph Lameter
Ok. Here is an updated patch:

This patch extracts all the operations on counters protected by the
page table lock (currently rss and anon_rss) into definitions in
include/linux/sched.h. All rss operations are performed through
the following macros:

get_mm_counter(mm, member)  -> Obtain the value of a counter
set_mm_counter(mm, member, value)   -> Set the value of a counter
add_mm_counter(mm, member, value)   -> Add to a counter
inc_mm_counter(mm, member)  -> Increment a counter
dec_mm_counter(mm, member)  -> Decrement a counter

With this patch it becomes easier to add new counters and it is possible
to redefine the method of counter handling. The counters are an
issue for scalability since they are used in frequently used code paths and
may cause cache line bouncing.

F.e. One may not use counters at all and count the pages when needed, switch
to atomic operations if the mm_struct locking changes or split the rss
into counters that can be locally incremented.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.11/include/linux/sched.h
===
--- linux-2.6.11.orig/include/linux/sched.h 2005-03-11 10:29:17.0 
-0800
+++ linux-2.6.11/include/linux/sched.h  2005-03-14 20:54:51.0 -0800
@@ -205,6 +205,13 @@ arch_get_unmapped_area_topdown(struct fi
 extern void arch_unmap_area(struct vm_area_struct *area);
 extern void arch_unmap_area_topdown(struct vm_area_struct *area);

+#define set_mm_counter(mm, member, value) (mm)->_##member = (value)
+#define get_mm_counter(mm, member) ((mm)->_##member)
+#define add_mm_counter(mm, member, value) (mm)->_##member += (value)
+#define inc_mm_counter(mm, member) (mm)->_##member++
+#define dec_mm_counter(mm, member) (mm)->_##member--
+typedef unsigned long mm_counter_t;
+#define MM_COUNTER_T(member) mm_counter_t _##member

 struct mm_struct {
struct vm_area_struct * mmap;   /* list of VMAs */
@@ -221,7 +228,7 @@ struct mm_struct {
atomic_t mm_count;  /* How many references to 
"struct mm_struct" (users count as 1) */
int map_count;  /* number of VMAs */
struct rw_semaphore mmap_sem;
-   spinlock_t page_table_lock; /* Protects page tables, 
mm->rss, mm->anon_rss */
+   spinlock_t page_table_lock; /* Protects page tables and 
some counters */

struct list_head mmlist;/* List of maybe swapped mm's.  
These are globally strung
 * together off init_mm.mmlist, 
and are protected
@@ -231,9 +238,13 @@ struct mm_struct {
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
-   unsigned long rss, anon_rss, total_vm, locked_vm, shared_vm;
+   unsigned long total_vm, locked_vm, shared_vm;
unsigned long exec_vm, stack_vm, reserved_vm, def_flags, nr_ptes;

+   /* Special counters protected by the page_table_lock */
+   MM_COUNTER_T(rss);
+   MM_COUNTER_T(anon_rss);
+
unsigned long saved_auxv[42]; /* for /proc/PID/auxv */

unsigned dumpable:1;
Index: linux-2.6.11/mm/memory.c
===
--- linux-2.6.11.orig/mm/memory.c   2005-03-11 10:29:17.0 -0800
+++ linux-2.6.11/mm/memory.c2005-03-14 20:50:20.0 -0800
@@ -312,9 +312,9 @@ copy_one_pte(struct mm_struct *dst_mm,
pte = pte_mkclean(pte);
pte = pte_mkold(pte);
get_page(page);
-   dst_mm->rss++;
+   inc_mm_counter(dst_mm, rss);
if (PageAnon(page))
-   dst_mm->anon_rss++;
+   inc_mm_counter(dst_mm, anon_rss);
set_pte_at(dst_mm, addr, dst_pte, pte);
page_dup_rmap(page);
 }
@@ -525,7 +525,7 @@ static void zap_pte_range(struct mmu_gat
if (pte_dirty(pte))
set_page_dirty(page);
if (PageAnon(page))
-   tlb->mm->anon_rss--;
+   dec_mm_counter(tlb->mm, anon_rss);
else if (pte_young(pte))
mark_page_accessed(page);
tlb->freed++;
@@ -1351,9 +1351,9 @@ static int do_wp_page(struct mm_struct *
page_table = pte_offset_map(pmd, address);
if (likely(pte_same(*page_table, pte))) {
if (PageAnon(old_page))
-   mm->anon_rss--;
+   dec_mm_counter(mm, anon_rss);
if (PageReserved(old_page))
-   ++mm->rss;
+   inc_mm_counter(mm, rss);
else
page_remove_rmap(old_page);
flush_cache_page(vma, address, pfn);
@@ -1759,7 

Re: Devices/Partitions over 2TB

2005-03-14 Thread jmerkey
Bernd Eckenfels wrote:
In article <[EMAIL PROTECTED]> you wrote:
 

You have to ignore the partition table contents for ending cylinder.
   

Good Question.  Where are the standard tools in FC2 and FC3 for these types?
Jeff
Gruss
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


MAC address instead of IP

2005-03-14 Thread Donald Duckie
Hi!

I am looking for some sample codes which uses MAC
address instead of TCP-IP for data transmission. Any
suggestions are highly appreciated.

And also, I have been digging into ethertap.c and
netlink_dev.c, but I cant fully understand how these
codes work. Where can I see some detailed explanations
of these source codes? What I usually see in a search
are compilation error problems and the like.

Thank you very much for any kind of information.


Regards.





__ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Exports to enable clock driver modules

2005-03-14 Thread Andrew Morton
Christoph Lameter <[EMAIL PROTECTED]> wrote:
>
> The following exports are necessary to allow loadable modules to define
>  new clocks.

I'll convert these to EXPORT_SYMBOL_GPL, OK?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: huge filesystems

2005-03-14 Thread jmerkey
Andrew Morton wrote:
jmerkey <[EMAIL PROTECTED]> wrote:
 

>I don't recall you reporting any of them.  How can we expect to fix
>anything if we aren't told about it?
>
>  
>
I report them when I can't get around them myself. I've been able to get
around most of them.
   

Jeff, that's all take and no give.
Please give: what problems have you observed in the current VFS for devices
and files less than 16TB?u
 

1. Scaling issues with readdir() with huge numbers of files (not even 
huge really. 87000 files in a dir takes a while
for readdir() to return results). I average 2-3 million files per 
directory on 2.6.9. It can take a up to a minute for
readdir() to return from initial reading from on of these directories 
with readdir() through the VFS.

2. NFS performance and stability issues with mapping NFS on top of dsfs. 
All sorts of problems (performance)
with system slowdowns -- in some cases can copy a file to a floppy 
system to system faster than I can copy over
100 mbit ethernet.

3. RCU and interrupt state problems with concurrent Network I/O and VFS 
interaction. Lots of places, I
reordered the code in these sections to hold more course grained locking.

4. BIO multiple chained requests has never worked correctly, so I have 
to submit 4K / BIO always. The design
and concept behind BIO's was great -- the implementation has a lot of 
problems. When I submit a chain
larger than 32 MB of 4K pages, the system looses state and the BIO's 
don't get returned or completed. And I see
some bizarre error returns from sumission. Jens classic response is 
always "Merkey you don't understand the interface" --
I have the code, I understand quite well, it does not work as advertised 
with these big sizes.

5. Files larger than 2TB work fine through the VFS provided I force mmap 
to use the internal interface. Files larger than
4 TB also seem to work fine. I have also tested with files larger than 
7TB, they also seem to work fine. I have not tested
individual files larger than 10 TB yet, but this will be happening in a 
month or so based on the units we are selling. When I
enable page cache mmap through the VFS, the system gets into trouble 
with these five memory pools from hell (slab, and the
various allocators in Linux -- I would think one byte level allocator 
would be enough) and the system has problems with
low memory conditions. I don't use the buffer cache because I post these 
huge coalesced sector runs to disk and need
memory in contuguous chunks, so the page cache/buffer cache don't 
optimize well in dsfs. I am achieving over 700 MB/S
megabytes per second to disk with custom hardware with the architecture 
I am using -- 6 % processor utilization on 2.6.9.

6. fdisk does not support drives arger than 2TB, so I have to hack the 
partition tables and fake out dsfs with 3TB abd 4TB
drives created with RAID 0 controllers and hardware. This needs to get 
fixed.

I will always give back changes to GPL code if folks ask for them -- buy 
an appliance through OSDL (they are really cool)
and request the GPL changes to Linux and I'll provide it as requested.

Order one from www.soleranetworks.com. Ask for Troy.
Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Devices/Partitions over 2TB

2005-03-14 Thread Bernd Eckenfels
In article <[EMAIL PROTECTED]> you wrote:
> You have to ignore the partition table contents for ending cylinder.

Why use MSDOS partition tables at all? What about LVM or GUID Partitions?

Gruss
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-mm3: saa7134-core.c compile error

2005-03-14 Thread Andrew Morton
Adrian Bunk <[EMAIL PROTECTED]> wrote:
>
> On Sat, Mar 12, 2005 at 03:42:22AM -0800, Andrew Morton wrote:
>  >...
>  > Changes since 2.6.11-mm2:
>  >...
>  > +saa7134-update.patch
>  >...
>  >  v4l updates
>  >...
> 
>  This doesn't compile with CONFIG_MODULES=n:
> 
>  <--  snip  -->
> 
>  ...
>CC  drivers/media/video/saa7134/saa7134-core.o
>  drivers/media/video/saa7134/saa7134-core.c: In function `saa7134_fini':
>  drivers/media/video/saa7134/saa7134-core.c:1215: error: `pending_registered' 
> undeclared (first use in this function)

Like this, I guess:

--- 25/drivers/media/video/saa7134/saa7134-core.c~saa7134-build-fix 
2005-03-14 20:37:16.0 -0800
+++ 25-akpm/drivers/media/video/saa7134/saa7134-core.c  2005-03-14 
20:37:27.0 -0800
@@ -1212,8 +1212,10 @@ static int saa7134_init(void)
 
 static void saa7134_fini(void)
 {
+#ifdef CONFIG_MODULES
if (pending_registered)
unregister_module_notifier(_notifier);
+#endif
pci_unregister_driver(_pci_driver);
 }
 
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Exports to enable clock driver modules

2005-03-14 Thread Christoph Lameter
The following exports are necessary to allow loadable modules to define
new clocks. Without these the mmtimer driver cannot be build
correctly as a module (there is another mmtimer specific fix necessary to
get  it to build properly but that will be a separate patch):

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.11/kernel/time.c
===
--- linux-2.6.11.orig/kernel/time.c 2005-03-01 23:37:50.0 -0800
+++ linux-2.6.11/kernel/time.c  2005-03-14 20:24:02.0 -0800
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -495,6 +496,8 @@ void getnstimeofday (struct timespec *tv
tv->tv_nsec = nsec;
 }

+EXPORT_SYMBOL(getnstimeofday);
+
 int do_settimeofday (struct timespec *tv)
 {
time_t wtm_sec, sec = tv->tv_sec;
Index: linux-2.6.11/kernel/posix-timers.c
===
--- linux-2.6.11.orig/kernel/posix-timers.c 2005-03-01 23:38:09.0 
-0800
+++ linux-2.6.11/kernel/posix-timers.c  2005-03-14 20:24:02.0 -0800
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 

 #ifndef div_long_long_rem
 #include 
@@ -397,6 +398,8 @@ int posix_timer_event(struct k_itimer *t
}
 }

+EXPORT_SYMBOL(posix_timer_event);
+
 /*
  * This function gets called when a POSIX.1b interval timer expires.  It
  * is used as a callback from the kernel internal timer.  The
@@ -491,6 +494,8 @@ void register_posix_clock(int clock_id,
posix_clocks[clock_id] = *new_clock;
 }

+EXPORT_SYMBOL(register_posix_clock);
+
 static struct k_itimer * alloc_posix_timer(void)
 {
struct k_itimer *tmr;
@@ -1198,11 +1203,15 @@ int do_posix_clock_nosettime(struct time
return -EINVAL;
 }

+EXPORT_SYMBOL(do_posix_clock_nosettime);
+
 int do_posix_clock_notimer_create(struct k_itimer *timer)
 {
return -EINVAL;
 }

+EXPORT_SYMBOL(do_posix_clock_notimer_create);
+
 int do_posix_clock_nonanosleep(int which_clock, int flags, struct timespec *t)
 {
 #ifndef ENOTSUP
@@ -1212,6 +1221,8 @@ int do_posix_clock_nonanosleep(int which
 #endif
 }

+EXPORT_SYMBOL(do_posix_clock_nonanosleep);
+
 asmlinkage long
 sys_clock_settime(clockid_t which_clock, const struct timespec __user *tp)
 {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PPC64 iSeries: cleanup iSeries_setup

2005-03-14 Thread Stephen Rothwell
Hi Andrew,

This patch does some trivial cleanups on iSeries_setup.[ch]:
- eliminiate warning about iommu_init_early_iSeries not being
  declared
- remove trailing whitespace
- change some functions to static
- remove defunct function declarations

Built and booted on iSeries.

Signed-off-by: Stephen Rothwell <[EMAIL PROTECTED]>
-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/

diff -ruNp linus-cleanup.1/arch/ppc64/kernel/iSeries_setup.c 
linus-cleanup.2/arch/ppc64/kernel/iSeries_setup.c
--- linus-cleanup.1/arch/ppc64/kernel/iSeries_setup.c   2005-03-06 
07:08:24.0 +1100
+++ linus-cleanup.2/arch/ppc64/kernel/iSeries_setup.c   2005-03-15 
15:23:35.0 +1100
@@ -15,7 +15,7 @@
  *  as published by the Free Software Foundation; either version
  *  2 of the License, or (at your option) any later version.
  */
- 
+
 #undef DEBUG
 
 #include 
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "iSeries_setup.h"
@@ -57,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern void hvlog(char *fmt, ...);
 
@@ -72,7 +74,6 @@ extern void ppcdbg_initialize(void);
 static void build_iSeries_Memory_Map(void);
 static void setup_iSeries_cache_sizes(void);
 static void iSeries_bolt_kernel(unsigned long saddr, unsigned long eaddr);
-extern void iSeries_setup_arch(void);
 extern void iSeries_pci_final_fixup(void);
 
 /* Global Variables */
@@ -108,8 +109,8 @@ struct MemoryBlock {
  * and return the number of physical blocks and fill in the array of
  * block data.
  */
-unsigned long iSeries_process_Condor_mainstore_vpd(struct MemoryBlock 
*mb_array,
-   unsigned long max_entries)
+static unsigned long iSeries_process_Condor_mainstore_vpd(
+   struct MemoryBlock *mb_array, unsigned long max_entries)
 {
unsigned long holeFirstChunk, holeSizeChunks;
unsigned long numMemoryBlocks = 1;
@@ -154,7 +155,7 @@ unsigned long iSeries_process_Condor_mai
 #define MaxSegmentAdrRangeBlocks   128
 #define MaxAreaRangeBlocks 4
 
-unsigned long iSeries_process_Regatta_mainstore_vpd(
+static unsigned long iSeries_process_Regatta_mainstore_vpd(
struct MemoryBlock *mb_array, unsigned long max_entries)
 {
struct IoHriMainStoreSegment5 *msVpdP =
@@ -246,7 +247,7 @@ unsigned long iSeries_process_Regatta_ma
printk("  Bitmap range: %016lx - %016lx\n"
"Absolute range: %016lx - %016lx\n",
mb_array[i].logicalStart,
-   mb_array[i].logicalEnd, 
+   mb_array[i].logicalEnd,
mb_array[i].absStart, mb_array[i].absEnd);
mb_array[i].absStart = addr_to_chunk(mb_array[i].absStart &
0x000f);
@@ -261,7 +262,7 @@ unsigned long iSeries_process_Regatta_ma
return numSegmentBlocks;
 }
 
-unsigned long iSeries_process_mainstore_vpd(struct MemoryBlock *mb_array,
+static unsigned long iSeries_process_mainstore_vpd(struct MemoryBlock 
*mb_array,
unsigned long max_entries)
 {
unsigned long i;
@@ -302,7 +303,7 @@ static void __init iSeries_parse_cmdline
*p = 0;
 }
 
-/*static*/ void __init iSeries_init_early(void)
+static void __init iSeries_init_early(void)
 {
DBG(" -> iSeries_init_early()\n");
 
@@ -355,7 +356,7 @@ static void __init iSeries_parse_cmdline
 #ifdef CONFIG_SMP
smp_init_iSeries();
 #endif
-   if (itLpNaca.xPirEnvironMode == 0) 
+   if (itLpNaca.xPirEnvironMode == 0)
piranha_simulator = 1;
 
/* Associate Lp Event Queue 0 with processor 0 */
@@ -385,21 +386,21 @@ static void __init iSeries_parse_cmdline
 /*
  * The iSeries may have very large memories ( > 128 GB ) and a partition
  * may get memory in "chunks" that may be anywhere in the 2**52 real
- * address space.  The chunks are 256K in size.  To map this to the 
- * memory model Linux expects, the AS/400 specific code builds a 
+ * address space.  The chunks are 256K in size.  To map this to the
+ * memory model Linux expects, the AS/400 specific code builds a
  * translation table to translate what Linux thinks are "physical"
- * addresses to the actual real addresses.  This allows us to make 
+ * addresses to the actual real addresses.  This allows us to make
  * it appear to Linux that we have contiguous memory starting at
  * physical address zero while in fact this could be far from the truth.
- * To avoid confusion, I'll let the words physical and/or real address 
- * apply to the Linux addresses while I'll use "absolute address" to 
+ * To avoid confusion, I'll let the words physical and/or real address
+ * apply to the Linux addresses while I'll use "absolute address" to
  * refer to the actual hardware real address.
  *
- * 

Re: 2.6.11-mm3 breaks compile of drivers/char/esp.c

2005-03-14 Thread Andrew Morton
Bernhard Rosenkraenzer <[EMAIL PROTECTED]> wrote:
>
> drivers/char/esp.c: In function 'rs_stop':
>  drivers/char/esp.c:213: error: 'struct esp_struct' has no member named 'lock'
>  drivers/char/esp.c:219: error: 'struct esp_struct' has no member named 'lock'
>  drivers/char/esp.c: In function 'rs_start':
>  drivers/char/esp.c:230: error: 'struct esp_struct' has no member named 'lock'
>  drivers/char/esp.c:236: error: 'struct esp_struct' has no member named 'lock'

Seems that Alan's diff was missing the changes to the header file.  Like
this?

--- 25/include/linux/hayesesp.h~esp-build-fix   2005-03-14 20:31:18.0 
-0800
+++ 25-akpm/include/linux/hayesesp.h2005-03-14 20:31:30.0 -0800
@@ -77,6 +77,7 @@ struct hayes_esp_config {
 
 struct esp_struct {
int magic;
+   spinlock_t  lock;
int port;
int irq;
int flags;  /* defined in tty.h */
_
 

I didn't pick this up because ESPSERIAL is still BROKEN_ON_SMP.  Alan,
should we remove that now?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User mode drivers: part 1, interrupt handling (patch for 2.6.11)

2005-03-14 Thread Lee Revell
On Sat, 2005-03-12 at 21:03 -0500, Jon Smirl wrote:
> On Fri, 11 Mar 2005 19:14:13 +, Alan Cox <[EMAIL PROTECTED]> wrote:
> > I posted a proposal for this sometime ago because X has some uses for
> > it. The idea being you'd pass a struct that describes
> > 
> > 1.  What tells you an IRQ occurred on this device
> > 2.  How to clear it
> > 3.  How to enable/disable it.
> > 
> > Something like
> > 
> > struct {
> > u8 type;/* 8, 16, 32  I/O or MMIO */
> > u8 bar; /* PCI bar to use */
> > u32 offset; /* Into bar */
> > u32 mask;   /* Bits to touch/compare */
> > u32 value;  /* Value to check against/set */
> > }
> >
> 
> It might useful to add this to the main kernel API, and then over time
> modify all of the drivers to use it. If a driver does this it would be
> safe to transparently move it to user space like in UML or xen.  I've
> been told that PCI Express and MSI does not have this problem.
> 

This seems sufficient for the simplest devices, that just have an
IRQ_PENDING and an IRQ_ACK register.  But what about a device like the
emu10k1 where you have a half loop and loop interrupt for each of 64
channels, plus about 10 other interrupt sources?  The IPR just tells you
there's a channel loop interrupt pending, in order to properly ACK it
you need to set a bit in one of 4 registers depending on whether it's a
loop or half loop interrupt, and whether the channel is 0-31 or 32-64.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ppc32: Add virtual DMA support to legacy floppy driver on ppc32

2005-03-14 Thread Benjamin Herrenschmidt
This patch adds support for pseudo-dma transfers on ppc32 for the
legacy floppy driver. It is useful on some machines like pegasos
where the legacy DMA doesn't seem to work properly (possibly to
the lack of a "legacy" DMA zone on ppc32).

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
Signed-off-by: Pavel Fedin <[EMAIL PROTECTED]>

Index: linux-work/include/asm-ppc/floppy.h
===
--- linux-work.orig/include/asm-ppc/floppy.h2005-03-15 11:59:38.0 
+1100
+++ linux-work/include/asm-ppc/floppy.h 2005-03-15 14:36:56.0 +1100
@@ -11,28 +11,149 @@
 #ifndef __ASM_PPC_FLOPPY_H
 #define __ASM_PPC_FLOPPY_H
 
-#define fd_inb(port)   inb_p(port)
-#define fd_outb(value,port)outb_p(value,port)
+#define fd_inb(port)   inb_p(port)
+#define fd_outb(value,port)outb_p(value,port)
 
-#define fd_enable_dma() enable_dma(FLOPPY_DMA)
-#define fd_disable_dma()disable_dma(FLOPPY_DMA)
-#define fd_request_dma()request_dma(FLOPPY_DMA,"floppy")
-#define fd_free_dma()   free_dma(FLOPPY_DMA)
-#define fd_clear_dma_ff()   clear_dma_ff(FLOPPY_DMA)
-#define fd_set_dma_mode(mode)   set_dma_mode(FLOPPY_DMA,mode)
-#define fd_set_dma_addr(addr)   set_dma_addr(FLOPPY_DMA,(unsigned 
int)virt_to_bus(addr))
-#define fd_set_dma_count(count) set_dma_count(FLOPPY_DMA,count)
+#define fd_disable_dma()   fd_ops->_disable_dma(FLOPPY_DMA)
+#define fd_free_dma()   fd_ops->_free_dma(FLOPPY_DMA)
+#define fd_get_dma_residue()fd_ops->_get_dma_residue(FLOPPY_DMA)
+#define fd_dma_setup(addr, size, mode, io) fd_ops->_dma_setup(addr, size, 
mode, io)
 #define fd_enable_irq() enable_irq(FLOPPY_IRQ)
 #define fd_disable_irq()disable_irq(FLOPPY_IRQ)
-#define fd_cacheflush(addr,size) /* nothing */
-#define fd_request_irq()request_irq(FLOPPY_IRQ, floppy_interrupt, \
-   SA_INTERRUPT|SA_SAMPLE_RANDOM, \
-   "floppy", NULL)
 #define fd_free_irq()   free_irq(FLOPPY_IRQ, NULL);
 
-__inline__ void virtual_dma_init(void)
+static int fd_request_dma(void);
+
+struct fd_dma_ops {
+   void (*_disable_dma)(unsigned int dmanr);
+   void (*_free_dma)(unsigned int dmanr);
+   int (*_get_dma_residue)(unsigned int dummy);
+   int (*_dma_setup)(char *addr, unsigned long size, int mode, int io);
+};
+
+static int virtual_dma_count;
+static int virtual_dma_residue;
+static char *virtual_dma_addr;
+static int virtual_dma_mode;
+static int doing_vdma;
+static struct fd_dma_ops *fd_ops;
+
+static irqreturn_t floppy_hardint(int irq, void *dev_id, struct pt_regs * regs)
+{
+   unsigned char st;
+   int lcount;
+   char *lptr;
+
+   if (!doing_vdma)
+   return floppy_interrupt(irq, dev_id, regs);
+
+
+   st = 1;
+   for (lcount=virtual_dma_count, lptr=virtual_dma_addr; 
+lcount; lcount--, lptr++) {
+   st=inb(virtual_dma_port+4) & 0xa0 ;
+   if (st != 0xa0) 
+   break;
+   if (virtual_dma_mode)
+   outb_p(*lptr, virtual_dma_port+5);
+   else
+   *lptr = inb_p(virtual_dma_port+5);
+   }
+   virtual_dma_count = lcount;
+   virtual_dma_addr = lptr;
+   st = inb(virtual_dma_port+4);
+
+   if (st == 0x20)
+   return IRQ_HANDLED;
+   if (!(st & 0x20)) {
+   virtual_dma_residue += virtual_dma_count;
+   virtual_dma_count=0;
+   doing_vdma = 0;
+   floppy_interrupt(irq, dev_id, regs);
+   return IRQ_HANDLED;
+   }
+   return IRQ_HANDLED;
+}
+
+static void vdma_disable_dma(unsigned int dummy)
+{
+   doing_vdma = 0;
+   virtual_dma_residue += virtual_dma_count;
+   virtual_dma_count=0;
+}
+
+static void vdma_nop(unsigned int dummy)
+{
+}
+
+
+static int vdma_get_dma_residue(unsigned int dummy)
+{
+   return virtual_dma_count + virtual_dma_residue;
+}
+
+
+static int fd_request_irq(void)
+{
+   if (can_use_virtual_dma)
+   return request_irq(FLOPPY_IRQ, floppy_hardint,SA_INTERRUPT,
+  "floppy", NULL);
+   else
+   return request_irq(FLOPPY_IRQ, floppy_interrupt,
+  
SA_INTERRUPT|SA_SAMPLE_RANDOM,
+  "floppy", NULL); 
+
+}
+
+static int vdma_dma_setup(char *addr, unsigned long size, int mode, int io)
+{
+   doing_vdma = 1;
+   virtual_dma_port = io;
+   virtual_dma_mode = (mode  == DMA_MODE_WRITE);
+   virtual_dma_addr = addr;
+   virtual_dma_count = size;
+   virtual_dma_residue = 0;
+   return 0;
+}
+
+static int hard_dma_setup(char *addr, unsigned long size, int mode, int io)
+{
+   /* actual, physical DMA */
+   

ppc32: Fix overflow in cpuinfo freq. display

2005-03-14 Thread Benjamin Herrenschmidt
Hi !

The CPU frequency in /proc/cpuinfo would overflow because of a
signed/unsigned bug. This fixes it.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>

Index: linux-work/arch/ppc/kernel/setup.c
===
--- linux-work.orig/arch/ppc/kernel/setup.c 2005-03-15 13:55:31.0 
+1100
+++ linux-work/arch/ppc/kernel/setup.c  2005-03-15 14:21:27.0 +1100
@@ -338,14 +338,15 @@
 of_show_percpuinfo(struct seq_file *m, int i)
 {
struct device_node *cpu_node;
-   int *fp, s;
+   u32 *fp;
+   int s;

cpu_node = find_type_devices("cpu");
if (!cpu_node)
return 0;
for (s = 0; s < i && cpu_node->next; s++)
cpu_node = cpu_node->next;
-   fp = (int *) get_property(cpu_node, "clock-frequency", NULL);
+   fp = (u32 *)get_property(cpu_node, "clock-frequency", NULL);
if (fp)
seq_printf(m, "clock\t\t: %dMHz\n", *fp / 100);
return 0;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ppc32: Update PowerMac models table

2005-03-14 Thread Benjamin Herrenschmidt
Hi !

This patch updates the table of PowerMac models, adding the Mac mini, a
few missing ones in older slots too, and sorts it in a more logical way.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>

Index: linux-work/arch/ppc/platforms/pmac_feature.c
===
--- linux-work.orig/arch/ppc/platforms/pmac_feature.c   2005-03-15 
11:56:42.0 +1100
+++ linux-work/arch/ppc/platforms/pmac_feature.c2005-03-15 
13:55:37.0 +1100
@@ -2022,10 +2022,11 @@
 #endif /* CONFIG_POWER4 */
 
 static struct pmac_mb_def pmac_mb_defs[] __pmacdata = {
-   /* Warning: ordering is important as some models may claim
-* beeing compatible with several types
-*/
 #ifndef CONFIG_POWER4
+   /*
+* Desktops
+*/
+
{   "AAPL,8500","PowerMac 8500/8600",
PMAC_TYPE_PSURGE,   NULL,
0
@@ -2058,14 +2059,6 @@
PMAC_TYPE_GAZELLE,  NULL,
0
},
-   {   "AAPL,3400/2400",   "PowerBook 3400",
-   PMAC_TYPE_HOOPER,   ohare_features,
-   PMAC_MB_CAN_SLEEP | PMAC_MB_MOBILE
-   },
-   {   "AAPL,3500","PowerBook 3500",
-   PMAC_TYPE_KANGA,ohare_features,
-   PMAC_MB_CAN_SLEEP | PMAC_MB_MOBILE
-   },
{   "AAPL,Gossamer","PowerMac G3 (Gossamer)",
PMAC_TYPE_GOSSAMER, heathrow_desktop_features,
0
@@ -2074,42 +2067,6 @@
PMAC_TYPE_SILK, heathrow_desktop_features,
0
},
-   {   "AAPL,PowerBook1998",   "PowerBook Wallstreet",
-   PMAC_TYPE_WALLSTREET,   heathrow_laptop_features,
-   PMAC_MB_CAN_SLEEP | PMAC_MB_MOBILE
-   },
-   {   "PowerBook1,1", "PowerBook 101 (Lombard)",
-   PMAC_TYPE_101_PBOOK,paddington_features,
-   PMAC_MB_MAY_SLEEP | PMAC_MB_MOBILE
-   },
-   {   "iMac,1",   "iMac (first generation)",
-   PMAC_TYPE_ORIG_IMAC,paddington_features,
-   0
-   },
-   {   "PowerMac4,1",  "iMac \"Flower Power\"",
-   PMAC_TYPE_PANGEA_IMAC,  pangea_features,
-   PMAC_MB_MAY_SLEEP
-   },
-   {   "PowerBook4,3", "iBook 2 rev. 2",
-   PMAC_TYPE_IBOOK2,   pangea_features,
-   PMAC_MB_MAY_SLEEP | PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE
-   },
-   {   "PowerBook4,2", "iBook 2",
-   PMAC_TYPE_IBOOK2,   pangea_features,
-   PMAC_MB_MAY_SLEEP | PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE
-   },
-   {   "PowerBook4,1", "iBook 2",
-   PMAC_TYPE_IBOOK2,   pangea_features,
-   PMAC_MB_MAY_SLEEP | PMAC_MB_HAS_FW_POWER | PMAC_MB_MOBILE
-   },
-   {   "PowerMac4,4",  "eMac",
-   PMAC_TYPE_EMAC, core99_features,
-   PMAC_MB_MAY_SLEEP
-   },
-   {   "PowerMac4,2",  "Flat panel iMac",
-   PMAC_TYPE_FLAT_PANEL_IMAC,  pangea_features,
-   PMAC_MB_CAN_SLEEP
-   },
{   "PowerMac1,1",  "Blue G3",
PMAC_TYPE_YOSEMITE, paddington_features,
0
@@ -2118,9 +2075,13 @@
PMAC_TYPE_YIKES,paddington_features,
0
},
-   {   "PowerBook2,1", "iBook (first generation)",
-   PMAC_TYPE_ORIG_IBOOK,   core99_features,
-   PMAC_MB_CAN_SLEEP | PMAC_MB_OLD_CORE99 | PMAC_MB_MOBILE
+   {   "PowerMac2,1",  "iMac FireWire",
+   PMAC_TYPE_FW_IMAC,  core99_features,
+   PMAC_MB_MAY_SLEEP | PMAC_MB_OLD_CORE99
+   },
+   {   "PowerMac2,2",  "iMac FireWire",
+   PMAC_TYPE_FW_IMAC,  core99_features,
+   PMAC_MB_MAY_SLEEP | PMAC_MB_OLD_CORE99
},
{   "PowerMac3,1",  "PowerMac G4 AGP Graphics",
PMAC_TYPE_SAWTOOTH, core99_features,
@@ -2134,30 +2095,96 @@
PMAC_TYPE_SAWTOOTH, core99_features,
PMAC_MB_MAY_SLEEP | PMAC_MB_OLD_CORE99
},
-   {   "PowerMac2,1",  "iMac FireWire",
-   PMAC_TYPE_FW_IMAC,  core99_features,
-   PMAC_MB_MAY_SLEEP | PMAC_MB_OLD_CORE99
+   {   "PowerMac3,4",  "PowerMac G4 Silver",
+   PMAC_TYPE_QUICKSILVER,  core99_features,
+  

Re: User mode drivers: part 1, interrupt handling (patch for 2.6.11)

2005-03-14 Thread Peter Chubb
> "Jon" == Jon Smirl <[EMAIL PROTECTED]> writes:

Jon> On Tue, 15 Mar 2005 14:47:42 +1100, Peter Chubb
Jon> <[EMAIL PROTECTED]> wrote:
>> What I really want to do is deprivilege the driver code as much as
>> possible.  Whatever a driver does, the rest of the system should
>> keep going.  That way malicious or buggy drivers can only affect
>> the processes that are trying to use the device they manage.
>> Moreover, it should be possible to kill -9 a driver, then restart
>> it, without the rest of the system noticing more than a hiccup.  To
>> do this, step one is to run the driver in user space, so that it's
>> subject to the same resource management control as any other
>> process.  Step two, which is a lot harder, is to connect the driver
>> back into the kernel so that it can be shared.  Tun/Tap can be used
>> for network devices, but it's really too slow -- you need zero-copy
>> and shared notification.

Jon> Have you considered running the drivers in a domain under Xen?

See the paper presented by Karlsruhr at OSDI:

Joshua LeVasseur, Volkmar Uhlig, Jan Stoess, and Stefan Götz:
Unmodified Device Driver Reuse and Improved System Dependability via
Virtual Machines.  OSDI '04.

They're using L4, rather than Xen as the paravirtualisation layer.

-- 
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
The technical we do immediately,  the political takes *forever*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: huge filesystems

2005-03-14 Thread Andrew Morton
jmerkey <[EMAIL PROTECTED]> wrote:
>
>  >I don't recall you reporting any of them.  How can we expect to fix
>  >anything if we aren't told about it?
>  >
>  >  
>  >
>  I report them when I can't get around them myself. I've been able to get
>  around most of them.

Jeff, that's all take and no give.

Please give: what problems have you observed in the current VFS for devices
and files less than 16TB?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


tired of wireless + WEP... uff

2005-03-14 Thread Yaroslav Halchenko
Dear Kernel People

Please advise... Long ago when I didn't use WEP I had my intenal (Network 
controller: Intersil Corporation: Unknown device 3872) and pcmcia belkin F5D6020
(probably version 1) working as charm wo tweeking (although I thought I
had to set RTS to 256 or call cardctl reset from time to time...)

Then I moved to Netgear WG511 using prism54 driver. Everything was
working well till (after some of the upgrades around 2.6.7-8) it start
freezing up my laptop completely. Although SysRq combination still can
reboot it, nothing else works. First I thought that it has something to
do with hardware (I got inside the laptop couple of times either to fix
LCD or touchpad or fan) because it seem to coincide with movement of
laptop -- I move it and it dies... but it died couple of times without
any physical effect... also it doesn't die with other kinds of pcmcia
wireless cards but died with the same model when I exchanged it...

Then I switched to my internal card and I came back to issues which
seems to be general for my laptop and pcmcia orinoco cards because I've
tried belkin as well and after 1-2 minutes of work they start start
reporting

Mar 14 21:56:28 localhost kernel: eth3: Error -16 transmitting packet
Mar 14 21:56:28 localhost kernel: hermes @ IO 0x100: Error -16 issuing command.

syslog and cardmgr become busy as hell... laptop becomes useless...

I had to take Belkin card away from slot but here are some details:
(I'm running 2.6.11.3 at the moment)

eth3: Station identity 001f:0003::0008
eth3: Looks like an Intersil firmware version 0.8.3
eth3: Ad-hoc demo mode supported
eth3: IEEE standard IBSS ad-hoc mode supported
eth3: WEP supported, 104-bit key
eth3: MAC address 00:30:BD:61:20:D9
eth3: Station name "Prism  I"
eth3: ready
eth3: index 0x01: Vcc 5.0, irq 3, io 0x0100-0x013f
eth3: New link status: Connected (0001)



Socket 0:
  product info: "Belkin", "11Mbps Wireless Notebook Network Adapter", "Version 
01.02", ""
  manfid: 0x0156, 0x0002
  function: 6 (network)
Socket 0:
  dev_info
NULL 0ns, 512b
  attr_dev_info
SRAM 500ns, 1kb
  vers_1 5.0, "Belkin", "11Mbps Wireless Notebook Network Adapter",
"Version 01.02", ""
  manfid 0x0156, 0x0002
  funcid network_adapter
  lan_technology wireless
  lan_speed 1 mb/sec
  lan_speed 2 mb/sec
  lan_speed 5 mb/sec
  lan_speed 11 mb/sec
  lan_media 2.4_GHz
  lan_node_id 00 30 bd 61 20 d9
  lan_connector Closed connector standard
  config base 0x03e0 mask 0x0001 last_index 0x01
  cftable_entry 0x01 [default]
Vcc Vmin 4750mV Vmax 5250mV Iavg 300mA Ipeak 300mA
Idown 10mA
io 0x-0x003f [lines=6] [16bit]
irq mask 0x [level] [pulse]

Please advise on where to look for reasons of this weird behavior... I
want to be friendly with WEP :-)


-- 
  Yaroslav Halchenko
  Research Assistant, Psychology Department, Rutgers
  Office  (973) 353-5440 x263  Fax (973) 353-1171
   Ph.D. Student  CS Dept. NJIT

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] sparsemem intro patches

2005-03-14 Thread Dave Hansen
On Mon, 2005-03-14 at 18:30 -0800, Andrew Morton wrote:
> Dave Hansen <[EMAIL PROTECTED]> wrote:
> >
> >  The following four patches provide the last needed changes before the
> >  introduction of sparsemem.  For a more complete description of what this
> >  will do, please see this patch:
> > 
> >  
> > http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch
> 
> I don't know what to think about this.  Can you describe sparsemem a little
> further, differentiate it from discontigmem and tell us why we want one?
>
> Is it for memory hotplug?  If so, how does it support hotplug?

Sparsemem is more flexible than discontig, and not tied to any existing
NUMA or MM structures like zones or pgdats.  That makes it ideal for
hotplug where those structures are going to be coming and going, sliced
and diced.

Another advantage is that sparse doesn't require each NUMA node's ranges
to be contiguous.  It can handle overlapping ranges between nodes with
no problems, where DISCONTIGMEM currently throws away that memory.
DISCONTIGMEM also requires that memory *inside* of a node be contiguous,
and have mem_map for all of it.  A once 64GB NUMA node with 63GB of the
memory removed wouldn't have much space left for anything but its
mem_map without sparsemem.

> To which architectures is this useful, and what is the attitude of the
> relevant maintenance teams?

We have implementations for NUMAQ, x86 Summit, flat x86, flat x86-64,
flat and NUMA ppc64, and some ia64 configurations.  All of those can
either do simulated, virtualized, or actual hardware memory hotplug of
some kind based on the sparsemem implementations. 

Not to put words in their mouths, but there hasn't been anything
negative that I can recall in a while from the architecture maintainers.
What was said that was negative was months ago, and resolved.  We've
been talking about this to most of them for quite a while now, and I
think they've grown accustomed to the idea. :)

I've cc'd all of the guilty parties.  Perhaps they can fill in my vague
statements with actual facts.  But, here are the vague statements
anyway:

  i386 - Martin Bligh seems happy with it, he helped design it.
x86-64 - Matt Tolentino has approached Andi Kleen with the necessary
 cleanups, and I believe the reaction has been positive.  I
 think Andi had some other non-hotplug plans for sparsemem, too.
 ppc64 - I can bribe Anton and Paul's employer.  Mike Kravetz and Joel
 Schopp have been working on this port, and I believe they've
 kept the maintainers informed and calm.
  ia64 - Quote from Jesse Barnes (November 19, 2004):

> CONFIG_NONLINEAR (SPARSE's old name) should be the *only*
> memory init code on ia64  when this is done.  That means
> getting rid of both discontig and contig and virtual memmap...

 I believe Jesse's been keeping up with the development as well.


> Quoting from the above patch:
> 
> > Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that
> > it can eventually become a complete replacement.
> > ...
> > This patch introduces CONFIG_FLATMEM.  It is used in almost all
> > cases where there used to be an #ifndef DISCONTIG, because
> > SPARSEMEM and DISCONTIGMEM often have to compile out the same areas
> > of code.
> 
> Would I be right to worry about increasing complexity, decreased
> maintainability and generally increasing mayhem?

You certainly would be.  For the time being, this increases the number
of config options and places for us to screw up.  However, I am
confident at this point that we're doing the right thing.  We had a more
complicated version of sparsemem at first.  We stripped it down to the
bare bones, and that's what we would like to submit soon.  It has the
capability to replace discontig, and will eventually _reduce_
complexity.

One of my favorite ways to demonstrate why I think it's *simple* are the
architecture ports.  The longest added function that I can find in the
ports is 17 lines including whitespace.

139 insertions(+), 36 deletions(-) for ia64:
http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-180-sparsemem-ia64.patch

75 insertions(+), 17 deletions(-) for ppc64:
http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-170-sparsemem-ppc64.patch

x86_64 is broken up a little more, but it's probably smaller than the
ppc64 one.

> If a competent kernel developer who is not familiar with how all this code
> hangs together wishes to acquaint himself with it, what steps should he
> take?

Dan Phillips spelled out the basic concepts of chopping things up into
sections a few years ago:

http://lwn.net/2002/0411/a/discontig.php3

However, we haven't yet implemented the phys_to_virt() translations that
he envisioned.  We don't need that until unless we need some advanced
hot-remove features which are many, many months away. 

Where should a competent kernel developer look to understand the code

Re: User mode drivers: part 1, interrupt handling (patch for 2.6.11)

2005-03-14 Thread Jon Smirl
On Tue, 15 Mar 2005 14:47:42 +1100, Peter Chubb
<[EMAIL PROTECTED]> wrote:
> What I really want to do is deprivilege the driver code as much as
> possible.  Whatever a driver does, the rest of the system should keep
> going.  That way malicious or buggy drivers can only affect the
> processes that are trying to use the device they manage.  Moreover, it
> should be possible to kill -9 a driver, then restart it, without the
> rest of the system noticing more than a hiccup.  To do this,
> step one is to run the driver in user space, so that it's subject to
> the same resource management control as any other process.  Step two,
> which is a lot harder, is to connect the driver back into the kernel
> so that it can be shared.  Tun/Tap can be used for network devices,
> but it's really too slow -- you need zero-copy and shared notification.

Have you considered running the drivers in a domain under Xen?

> 
> --
> Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
> The technical we do immediately,  the political takes *forever*
> 


-- 
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: huge filesystems

2005-03-14 Thread jmerkey
Andrew Morton wrote:
jmerkey <[EMAIL PROTECTED]> wrote:
 

I am running the DSFS file system as a 7 TB file system on 2.6.9.
   

On a 32-bit CPU?
 

Yep.
 

There are a host of problems with the current VFS,
   

I don't recall you reporting any of them.  How can we expect to fix
anything if we aren't told about it?
 

I report them when I can't get around them myself. I've been able to get
around most of them.
ad I have gotten around most of them 
by **NOT** using the linux page cache interface.
   

Well that won't fly.
 

For this application it will.
The VFS should support devices up to 16TB on 32-bit CPUs.  If you know of
scenarios in which it fails to do that, please send a bug report.
 

Based on the changes I've mode to it locally for my version of 2.6.9, it
now goes to
1 zetabyte (1024 pedabytes). Largest one I've configured so far with
actual storage
is 128 TB, though. Had to drop the page cache and replace though -- for now.
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User mode drivers: part 1, interrupt handling (patch for 2.6.11)

2005-03-14 Thread Peter Chubb
> "Jon" == Jon Smirl <[EMAIL PROTECTED]> writes:

Jon> On Mon, 14 Mar 2005 12:42:27 +1100, Peter Chubb
Jon> <[EMAIL PROTECTED]> wrote:
>> > "Jon" == Jon Smirl <[EMAIL PROTECTED]> writes:
>> 
>> >> The scenario I'm thinking about with these patches are things
>> like >> low-latency user-level networking between nodes in a
>> cluster, where >> for good performance even with a kernel driver
>> you don't want to >> share your interrupt line with anything else.
>> 
Jon> The code needs to refuse to install if the IRQ line is shared.
>>  It does.  The request_irq() call explicitly does not include
>> SA_SHARED in its flags, so if the line is shared, it'll return an
>> error to user space when the driver tries to open the file
>> representing the interrupt.

Jon> Please put some big comments warning people about adding
Jon> SA_SHARED. I can easily see someone thinking that they are fixing
Jon> a bug by adding it. I'd probably even write a paragraph about
Jon> what will happen if SA_SHARED is added.

Will do.  The main problem here is X86, as other architectures either
don't care, or have enough interrupt lines.  And the people who are
paying me for this kind of thing all run IA64

What I really want to do is deprivilege the driver code as much as
possible.  Whatever a driver does, the rest of the system should keep
going.  That way malicious or buggy drivers can only affect the
processes that are trying to use the device they manage.  Moreover, it
should be possible to kill -9 a driver, then restart it, without the
rest of the system noticing more than a hiccup.  To do this,
step one is to run the driver in user space, so that it's subject to
the same resource management control as any other process.  Step two,
which is a lot harder, is to connect the driver back into the kernel
so that it can be shared.  Tun/Tap can be used for network devices,
but it's really too slow -- you need zero-copy and shared notification.


-- 
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
The technical we do immediately,  the political takes *forever*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [swsusp/ppc] Re: What's going on here ?

2005-03-14 Thread hugang
On Tue, Mar 15, 2005 at 01:19:46PM +1100, Benjamin Herrenschmidt wrote:
> 
> > rjw and hugang did (pretty neccessary) changes to base swsusp (pagedir
> > table -> pagedir linklist), that unfortunately needed update to all
> > the assembly parts. It was series 1/3 update core, i386 and x86-64,
> > 2/3 update ppc, 3/3 introduce initramfs.
> > 
> > This is the offending patch I believe (but the version that was merged
> > was From: me, without code changes).
> > 
> > I realized that patch does more than changing from table to linklist,
> > but it looked mostly okay, so I forwarded it. Sorry.
> 
> It does more than that ... it _adds_ swsusp to ppc ! swsusp wasn't in
> mainline at all for ppc because I consider it not ready. And even the
> asm change should go through me anyway since i wrote that code and I'm
> not sure they know all the possible "issues" with that code.
> 
> > So, what to do now?
> > 
> > a) just revert it
> > 
> > or
> > 
> > b) revert pmac_setup.c and via-pmu parts and Kconfig part
> > 
> > or
> > 
> > c) just disable Kconfig part and fix it up with incremental patches

I hope that's can merge into, It works fine in my PowerBook G4.

> 
> I'll decide later today. I may well keep it and do the cleanup I had in
> mind on top of this, which means merging the pmac suspend-to-ram with
> the common infrastructure. But that will need some changes & hooks to
> the core swsusp.
> 
> Ben.


-- 
Hu Gang   .-.
  /v\
 // \\ 
Linux User  /(   )\  [204016]
GPG Key ID   ^^-^^   http://soulinfo.com/~hugang/hugang.asc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] x86: fix ESP corruption CPU bug (take 2)

2005-03-14 Thread Andrew Morton
Stas Sergeev <[EMAIL PROTECTED]> wrote:
>
> Alan Cox wrote:
>  >> Alan, can you please apply that to an -ac
>  >> tree?
>  > Ask Andrew Morton as it belongs in the -mm tree
>  Actually I tried that already.

I added this patch to -mm.

> Andrew
>  had nothing against that patch personally,
>  as well as Linus, but after all that didn't
>  work:
>  http://lkml.org/lkml/2005/1/3/260
> 
>  So it can't be applied to -mm, and not
>  depending on the kgdb-ga patch allowed for
>  some extra optimization.

The rule is:

- If the patch patches something which is in Linus's kernel, prepare a
  diff against Linus's latest kernel.

- If the patch patches something which is only in -mm, prepare a patch
  against -mm.

In this case, I merged the patch prior to the kgdb patch and then fixed
up the fallout.

(If that causes kgdb to break in non-obvious-to-me ways then I might come
calling "help".  We'll see)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PPC64 iSeries: cleanup viopath

2005-03-14 Thread Stephen Rothwell
Hi Andrew,

Since you brought this file to my attention, I figured I might as well do
some simple cleanups.  This patch does:
- single bit int bitfields are a bit suspect and Anndrew pointed
  out recently that they are probably slower to access than ints
- get rid of some more stufly caps
- define the semaphore and the atomic in struct alloc_parms
  rather than pointers to them since we just allocate them on
  the stack anyway.
- one small white space cleanup
- use the HvLpIndexInvalid constant instead of ita value

Built and booted on iSeries (which is the only place it is used).

Signed-off-by: Stephen Rothwell <[EMAIL PROTECTED]>
-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/

diff -ruNp linus/arch/ppc64/kernel/viopath.c 
linus-cleanup.1/arch/ppc64/kernel/viopath.c
--- linus/arch/ppc64/kernel/viopath.c   2005-03-13 04:07:42.0 +1100
+++ linus-cleanup.1/arch/ppc64/kernel/viopath.c 2005-03-15 14:02:48.0 
+1100
@@ -42,6 +42,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -56,8 +57,8 @@
  * But this allows for other support in the future.
  */
 static struct viopathStatus {
-   int isOpen:1;   /* Did we open the path?*/
-   int isActive:1; /* Do we have a mon msg outstanding */
+   int isOpen; /* Did we open the path?*/
+   int isActive;   /* Do we have a mon msg outstanding */
int users[VIO_MAX_SUBTYPES];
HvLpInstanceId mSourceInst;
HvLpInstanceId mTargetInst;
@@ -81,10 +82,10 @@ static void handleMonitorEvent(struct Hv
  * blocks on the semaphore and the handler posts the semaphore.  However,
  * if system_state is not SYSTEM_RUNNING, then wait_atomic is used ...
  */
-struct doneAllocParms_t {
-   struct semaphore *sem;
+struct alloc_parms {
+   struct semaphore sem;
int number;
-   atomic_t *wait_atomic;
+   atomic_t wait_atomic;
int used_wait_atomic;
 };
 
@@ -97,9 +98,9 @@ static u8 viomonseq = 22;
 /* Our hosting logical partition.  We get this at startup
  * time, and different modules access this variable directly.
  */
-HvLpIndex viopath_hostLp = 0xff;   /* HvLpIndexInvalid */
+HvLpIndex viopath_hostLp = HvLpIndexInvalid;
 EXPORT_SYMBOL(viopath_hostLp);
-HvLpIndex viopath_ourLp = 0xff;
+HvLpIndex viopath_ourLp = HvLpIndexInvalid;
 EXPORT_SYMBOL(viopath_ourLp);
 
 /* For each kind of incoming event we set a pointer to a
@@ -200,7 +201,7 @@ EXPORT_SYMBOL(viopath_isactive);
 
 /*
  * We cache the source and target instance ids for each
- * partition.  
+ * partition.
  */
 HvLpInstanceId viopath_sourceinst(HvLpIndex lp)
 {
@@ -450,36 +451,33 @@ static void vio_handleEvent(struct HvLpE
 
 static void viopath_donealloc(void *parm, int number)
 {
-   struct doneAllocParms_t *parmsp = (struct doneAllocParms_t *)parm;
+   struct alloc_parms *parmsp = parm;
 
parmsp->number = number;
if (parmsp->used_wait_atomic)
-   atomic_set(parmsp->wait_atomic, 0);
+   atomic_set(>wait_atomic, 0);
else
-   up(parmsp->sem);
+   up(>sem);
 }
 
 static int allocateEvents(HvLpIndex remoteLp, int numEvents)
 {
-   struct doneAllocParms_t parms;
-   DECLARE_MUTEX_LOCKED(Semaphore);
-   atomic_t wait_atomic;
+   struct alloc_parms parms;
 
if (system_state != SYSTEM_RUNNING) {
parms.used_wait_atomic = 1;
-   atomic_set(_atomic, 1);
-   parms.wait_atomic = _atomic;
+   atomic_set(_atomic, 1);
} else {
parms.used_wait_atomic = 0;
-   parms.sem = 
+   init_MUTEX_LOCKED();
}
mf_allocate_lp_events(remoteLp, HvLpEvent_Type_VirtualIo, 250,  /* It 
would be nice to put a real number here! */
numEvents, _donealloc, );
if (system_state != SYSTEM_RUNNING) {
-   while (atomic_read(_atomic))
+   while (atomic_read(_atomic))
mb();
} else
-   down();
+   down();
return parms.number;
 }
 
@@ -558,8 +556,7 @@ int viopath_close(HvLpIndex remoteLp, in
unsigned long flags;
int i;
int numOpen;
-   struct doneAllocParms_t doneAllocParms;
-   DECLARE_MUTEX_LOCKED(Semaphore);
+   struct alloc_parms parms;
 
if ((remoteLp >= HvMaxArchitectedLps) || (remoteLp == HvLpIndexInvalid))
return -EINVAL;
@@ -580,11 +577,11 @@ int viopath_close(HvLpIndex remoteLp, in
 
spin_unlock_irqrestore(, flags);
 
-   doneAllocParms.used_wait_atomic = 0;
-   doneAllocParms.sem = 
+   parms.used_wait_atomic = 0;
+   init_MUTEX_LOCKED();
mf_deallocate_lp_events(remoteLp, HvLpEvent_Type_VirtualIo,
- numReq, _donealloc, );
-

Re: swsusp_restore crap

2005-03-14 Thread Benjamin Herrenschmidt
On Tue, 2005-03-15 at 14:24 +1100, Benjamin Herrenschmidt wrote:
> Hi Pavel !
> 
> Please kill that swsusp_restore() call that itself calls
> flush_tlb_global(), it's junk. First, the flush_tlb_global() thing is
> arch specific, and that's all swsusp_restore() does. Then, the asm just
> calls this before returning to C code, so it makes no sense to have a
> hook there. The x86 asm can have it's own call to some arch stuff if it
> wants or just do the tlb flush in asm...

Better, here is a patch... (note: flush_tlb_global() is an x86'ism,
doesn't exist on ppc, thus breaks compile, and that has nothing to do in
the generic code imho, it should be clearly defined as the
responsibility of the asm code).

--

This patch removes the quite x86-specific swsusp_restore() hook from the
generic swsusp code and moves it to arch/i386. This also fixes build on
ppc with swsusp enabled.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>

Index: linux-work/arch/i386/power/swsusp.S
===
--- linux-work.orig/arch/i386/power/swsusp.S2005-03-15 11:56:17.0 
+1100
+++ linux-work/arch/i386/power/swsusp.S 2005-03-15 14:29:09.0 +1100
@@ -58,5 +58,5 @@
movl saved_context_edi, %edi
 
pushl saved_context_eflags ; popfl
-   call swsusp_restore
+   call __swsusp_flush_tlb
ret
Index: linux-work/arch/i386/power/cpu.c
===
--- linux-work.orig/arch/i386/power/cpu.c   2005-03-15 11:56:17.0 
+1100
+++ linux-work/arch/i386/power/cpu.c2005-03-15 14:28:26.0 +1100
@@ -147,6 +147,15 @@
__restore_processor_state(_context);
 }
 
+asmlinkage int __swsusp_flush_tlb(void)
+{
+   BUG_ON (nr_copy_pages_check != nr_copy_pages);
+   
+   /* Even mappings of "global" things (vmalloc) need to be fixed */
+   __flush_tlb_global();
+   return 0;
+}
+
 /* Needed by apm.c */
 EXPORT_SYMBOL(save_processor_state);
 EXPORT_SYMBOL(restore_processor_state);
Index: linux-work/kernel/power/swsusp.c
===
--- linux-work.orig/kernel/power/swsusp.c   2005-03-15 12:00:13.0 
+1100
+++ linux-work/kernel/power/swsusp.c2005-03-15 14:29:19.0 +1100
@@ -907,15 +907,6 @@
 }
 
 
-asmlinkage int swsusp_restore(void)
-{
-   BUG_ON (nr_copy_pages_check != nr_copy_pages);
-   
-   /* Even mappings of "global" things (vmalloc) need to be fixed */
-   __flush_tlb_global();
-   return 0;
-}
-
 int swsusp_resume(void)
 {
int error;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about initramfs

2005-03-14 Thread Alexander E. Patrakov
Jim Gifford wrote:

> Question: Initramfs is going to replace initrd, but I haven't seen
> anyone explain how to copy modules that are built during the build
> process moved into the initramfs archive. Has somebody done, this or is
> this still a work in progress?

Easy.

1) Unpack a vanilla kernel and build and install it as you usually do for a
system that doesn't need initramfs.

make menuconfig
make
make modules_install
cp arch/i386/boot/bzimage /boot/linux-2.6

2) Make a temporary directory (say, "initramfs") and put all files that you
want to go to your initramfs there. Don't forget the "/init" file, it is
used as a starting point for initramfs.

3) Make the initramfs image:

cd initramfs
find . | cpio -o -H newc | gzip -9 >/boot/initramfs-2.6.cpio.gz

4) Add /boot/linux-2.6 and /boot/initramfs-2.6.cpio.gz to your LILO or GRUB
as you would normally do with a kernel image and the initrd:

image=/boot/linux-2.6
label="Linux"
initrd=/boot/initramfs-2.6.cpio.gz
root=/dev/hda1  # if your initramfs "/init" script understands this
read-only   # if your initramfs "/init" script understands this

5) Upon reboot, the kernel will automatically determine that the image is
really an initramfs, not an initrd.

-- 
Alexander E. Patrakov

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

2005-03-14 Thread Christoph Lameter
On Mon, 14 Mar 2005, Albert Cahalan wrote:

> When the vsyscall page is created, copy the one needed function
> into it. The kernel is already self-modifying in many places; this
> is nothing new.

AFAIK this will only works on ia32 and x86_64 and not definitely not
on ia64. Who knows about the other platforms 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


swsusp_restore crap

2005-03-14 Thread Benjamin Herrenschmidt
Hi Pavel !

Please kill that swsusp_restore() call that itself calls
flush_tlb_global(), it's junk. First, the flush_tlb_global() thing is
arch specific, and that's all swsusp_restore() does. Then, the asm just
calls this before returning to C code, so it makes no sense to have a
hook there. The x86 asm can have it's own call to some arch stuff if it
wants or just do the tlb flush in asm...

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: huge filesystems

2005-03-14 Thread Andrew Morton
jmerkey <[EMAIL PROTECTED]> wrote:
>
> I am running the DSFS file system as a 7 TB file system on 2.6.9.

On a 32-bit CPU?

> There are a host of problems with the current VFS,

I don't recall you reporting any of them.  How can we expect to fix
anything if we aren't told about it?

> ad I have gotten around most of them 
>  by **NOT** using the linux page cache interface.

Well that won't fly.


The VFS should support devices up to 16TB on 32-bit CPUs.  If you know of
scenarios in which it fails to do that, please send a bug report.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User mode drivers: part 1, interrupt handling (patch for 2.6.11)

2005-03-14 Thread Jon Smirl
On Mon, 14 Mar 2005 12:42:27 +1100, Peter Chubb
<[EMAIL PROTECTED]> wrote:
> > "Jon" == Jon Smirl <[EMAIL PROTECTED]> writes:
> 
> >>  The scenario I'm thinking about with these patches are things like
> >> low-latency user-level networking between nodes in a cluster, where
> >> for good performance even with a kernel driver you don't want to
> >> share your interrupt line with anything else.
> 
> Jon> The code needs to refuse to install if the IRQ line is shared.
> 
> It does.  The request_irq() call explicitly does not include SA_SHARED
> in its flags, so if the line is shared, it'll return an error to user
> space when the driver tries to open the file representing the interrupt.

Please put some big comments warning people about adding SA_SHARED. I
can easily see someone thinking that they are fixing a bug by adding
it. I'd probably even write a paragraph about what will happen if
SA_SHARED is added.

> 
> Jon> Also what about SMP, if you shut the IRQ off on one CPU isn't it
> Jon> still enabled on all of the others?
> 
> Nope.   disable_irq_nosync() talks to the interrupt controller, which
> is common to all the processors.  The main problem is that it's slow,
> because it has to go off-chip.
> 
> --
> Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
> The technical we do immediately,  the political takes *forever*
> 


-- 
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: User mode drivers: part 1, interrupt handling (patch for 2.6.11)

2005-03-14 Thread Jon Smirl
On Mon, 14 Mar 2005 13:33:31 +, Alan Cox <[EMAIL PROTECTED]> wrote:
> On Llu, 2005-03-14 at 00:02, Peter Chubb wrote:
> > I can see there'd be problems if the code allowed shared interrupts,
> > but it doesn't.
> 
> If you don't allow shared IRQ's its useless, if you do allow shared
> IRQ's it deadlocks. Take your pick 8)
> 
> As to your comment about needing to do a few more I/O operations I
> agree. However if your need is for speed then you might want to just
> write a small IRQ helper module for the kernel or extend the syntax I
> proposed a little (its conveniently trivial to generate native code from
> this).

The concept of passing in a little structure telling how to
acknowledge an interrupt is a very good one. I'd like to see it added
as a kernel feature so that drivers could start being converted to it.
This is a big deal for Xen since Xen has the same problem with
forwarded IRQs. Xen would pass the little structure from the domain to
the supervisor so that the supervisor could cut off the IRQ if the
domain fails.

> 
> There isn't much you can do about the status read without MWI on most
> chip designs (some get it right by posting status to system memory but
> not many)
> 
> Alan
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-14 Thread Matt Mackall
On Mon, Mar 14, 2005 at 04:30:33PM +, Phillip Lougher wrote:

> +config SQUASHFS_1_0_COMPATIBILITY
> + bool "Include support for mounting SquashFS 1.x filesystems"

How common are these? It would be nice not to bring in legacy code.

> +#define SERROR(s, args...)   do { \
> + if (!silent) \
> + printk(KERN_ERR "SQUASHFS error: "s, ## args);\
> + } while(0)

Why would we ever want to be silent about something of KERN_ERR
severity? Isn't that a better job for klogd?

> +#define SQUASHFS_MAGIC   0x73717368
> +#define SQUASHFS_MAGIC_SWAP  0x68737173

Again, what's the story here? Is this purely endian conversion or do
filesystems of both endian persuasions exist? If the latter, let's not
keep that legacy. Pick an order, and use endian conversion functions
unconditionally everywhere.

> +#define SQUASHFS_COMPRESSED_SIZE_BLOCK(B)(((B) & \
> + ~SQUASHFS_COMPRESSED_BIT_BLOCK) ? (B) & \
> + ~SQUASHFS_COMPRESSED_BIT_BLOCK : SQUASHFS_COMPRESSED_BIT_BLOCK)

Shortening all these macro names would be nice..

> +typedef unsigned int squashfs_block;
> +typedef long longsquashfs_inode;

Eh? Seems we can have many more inodes than blocks? What sorts of
volume limits do we have here?

> + unsigned ints_major:16;
> + unsigned ints_minor:16;

What's going on here? s_minor's not big enough for modern minor
numbers.

> +typedef struct {
> + unsigned intindex:27;
> + unsigned intstart_block:29;
> + unsigned char   size;

Eep. Not sure how bit-fields handle crossing word boundaries, would be
surprised if this were very portable.

> + * macros to convert each packed bitfield structure from little endian to big
> + * endian and vice versa.  These are needed when creating or using a 
> filesystem
> + * on a machine with different byte ordering to the target architecture.
> + *
> + */
> +
> +#define SQUASHFS_SWAP_SUPER_BLOCK(s, d) {\
> + SQUASHFS_MEMSET(s, d, sizeof(squashfs_super_block));\
> + SQUASHFS_SWAP((s)->s_magic, d, 0, 32);\
> + SQUASHFS_SWAP((s)->inodes, d, 32, 32);\
> + SQUASHFS_SWAP((s)->bytes_used, d, 64, 32);\
> + SQUASHFS_SWAP((s)->uid_start, d, 96, 32);\
> + SQUASHFS_SWAP((s)->guid_start, d, 128, 32);\
> + SQUASHFS_SWAP((s)->inode_table_start, d, 160, 32);\
> + SQUASHFS_SWAP((s)->directory_table_start, d, 192, 32);\
> + SQUASHFS_SWAP((s)->s_major, d, 224, 16);\
> + SQUASHFS_SWAP((s)->s_minor, d, 240, 16);\
> + SQUASHFS_SWAP((s)->block_size_1, d, 256, 16);\
> + SQUASHFS_SWAP((s)->block_log, d, 272, 16);\
> + SQUASHFS_SWAP((s)->flags, d, 288, 8);\
> + SQUASHFS_SWAP((s)->no_uids, d, 296, 8);\
> + SQUASHFS_SWAP((s)->no_guids, d, 304, 8);\
> + SQUASHFS_SWAP((s)->mkfs_time, d, 312, 32);\
> + SQUASHFS_SWAP((s)->root_inode, d, 344, 64);\
> + SQUASHFS_SWAP((s)->block_size, d, 408, 32);\
> + SQUASHFS_SWAP((s)->fragments, d, 440, 32);\
> + SQUASHFS_SWAP((s)->fragment_table_start, d, 472, 32);\
> +}

Are those positions in bits? If you're going to go to the trouble of
swapping the whole thing, I think it'd be easier to just unpack the
and endian-convert the thing so that we didn't have the overhead of
bitfields and unpacking except at read/write time. Something like:

void pack(void *src, void *dest, pack_table_t *e);
void unpack(void *src, void *dest, pack_table_t *e);
size_t pack_size(pack_table_t);

where e is an array containing basically the info you have in the
above macros for each element: offset into unpacked structure,
starting bit in packed structure, and packed bits.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serious problems with HFS+

2005-03-14 Thread Matt Mackall
On Mon, Mar 14, 2005 at 11:18:49AM +0100, Roman Zippel wrote:
> Hi,
> 
> On Sun, 13 Mar 2005, Matt Mackall wrote:
> 
> > I've noticed a few problems with HFS+ support in recent kernels on
> > another user's machine running Ubuntu (Warty) running
> > 2.6.8.1-3-powerpc. I'm not in a position to extensively test or fix
> > either of these problem because of the fs tools situation so I'm just
> > passing this on.
> > 
> > First, it reports inappropriate blocks to stat(2). It uses 4096 byte
> > blocks rather than 512 byte blocks which stat callers are expecting.
> > This seriously confuses du(1) (and me, for a bit). Looks like it may
> > be forgetting to set s_blocksize_bits.
> 
> This should be fixed since 2.6.10.
> 
> > Second, if an HFS+ filesystem mounted via Firewire or USB becomes
> > detached, the filesystem appears to continue working just fine. I can
> > find on the entire tree, despite memory pressure. I can even create
> > new files that continue to appear in directory listings! Writes to
> > such files succeed (they're async, of course) and the typical app is
> > none the wiser. It's only when apps attempt to read later that they
> > encounter problems. It turns out that various apps including scp
> > ignore IO errors on read and silently copy zero-filled files to the
> > destination. So I got this report as "why aren't the pictures I took
> > off my camera visible on my website?"
> 
> HFS+ metadata is also in the page cache, so as long as everything is 
> cached, HFS+ won't notice a problem.

It's failing to notice errors at sync time or when such pages get
flushed due to memory pressure.

> > This is obviously a really nasty failure mode. At the very least, open
> > of new files should fail with -EIO. Preferably the fs should force a
> > read-only remount on IO errors. Given that the vast majority of HFS+
> > filesystems Linux is likely to be used with are on hotpluggable media,
> > I think this FS should be marked EXPERIMENTAL until such integrity
> > problems are addressed.
> 
> Currently nobody tells fs about such events, so even if I check for 
> write errors, it can still take a while until the error is detected.

It should catch up within the flush interval or at the next sync, at
least. And then fail all further writes. Consider the scenario of a
user sitting at their laptop when a power glitch offlines their
external drive. They can copy files onto it for the next hour, delete
the originals and be completely unaware that anything is wrong unless
they happen to check dmesg.

> It would be nice if the fs would be tould about plug/unplug events, e.g. 
> HFS+ could check the mount count to see if it was connected to a different 
> host in the meantime and react appropriately.

An FS-level callback when the underlying block device went for a walk
would be a nice hook just about everywhere. And at least a start on
the problem.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6] fix mmap() return value to conform POSIX

2005-03-14 Thread Gordon Jin
This patch fixes 2 return values in mmap() to conform POSIX spec:

[EINVAL]
The value of len is zero.

[ENOMEM]
MAP_FIXED was specified, and the range [addr,addr+len) exceeds
that allowed for the address space of a process; or, if
MAP_FIXED was not specified and there is insufficient room in
the address space to effect the mapping.

--- linux-2.6.11.3/mm/mmap.c.orig   2005-03-14 13:20:11.0 -0800
+++ linux-2.6.11.3/mm/mmap.c2005-03-14 17:24:37.0 -0800
@@ -897,12 +897,12 @@ unsigned long do_mmap_pgoff(struct file 
prot |= PROT_EXEC;
 
if (!len)
-   return addr;
+   return -EINVAL;
 
/* Careful about overflows.. */
len = PAGE_ALIGN(len);
if (!len || len > TASK_SIZE)
-   return -EINVAL;
+   return -ENOMEM;
 
/* offset overflow? */
if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ancient portmap segfault

2005-03-14 Thread Mark Studebaker
Andi,
thanks for the response.
The code forks immediately and the new process segfaults immediately. 
From an inspection of 'strace -f' on a working version, the next call
would have been setsid() . (The library call in the code is daemon(0,0)).
The original Makefile has an LDFLAG of -N (OMAGIC: make text secion writable, 
don't page-align the data section No idea why).

If I compile with ancient gcc/ld,
it works after compiling without -N and segfaults when compiling with -N.
If I compile with a recent gcc/ld, it works fine.
here's an objump of the segfaulting portmap

 objdump -x /usr/sbin/portmap
/usr/sbin/portmap: file format a.out-i386-linux
/usr/sbin/portmap
architecture: i386, flags 0x0002:
EXEC_P
start address 0x
Sections:
Idx Name  Size  VMA   LMA   File off  Algn
 0 .text 0f7c      0020  2**2
 CONTENTS, ALLOC, LOAD, CODE
 1 .data 0110  0f7c  0f7c  0f9c  2**2
 CONTENTS, ALLOC, LOAD, DATA
 2 .bss  0018  108c  108c    2**2
 ALLOC
SYMBOL TABLE:
no symbols
---
and here's the objdump of the test without -N
 objdump -h a.out
a.out: file format a.out-i386-linux
Sections:
Idx Name  Size  VMA   LMA   File off  Algn
 0 .text 1fe0  1020  1020  0020  2**3
 CONTENTS, ALLOC, LOAD, CODE
 1 .data 1000  3000  3000  2000  2**3
 CONTENTS, ALLOC, LOAD, DATA
 2 .bss    4000  4000    2**3
 ALLOC

--
so maybe the alignment difference is the problem?
as I said before, I have things working, only reporting this on the possibility
that it's a bug worth  investigating.
thanks
mds
Andi Kleen wrote:
Mark Studebaker <[EMAIL PROTECTED]> writes:

I upgraded from 2.6.5 to 2.6.11.2 and my ancient (libc4 a.out) /sbin/portmap 
from 1994 that's been running without complaint
on kernels for 11 years now consistently segfaults.
I upgraded to a version 4 RPM (circa 2002) and that fixed it.
If some compatibility was broken on purpose, that's fine, although I couldn't 
find anything in the kernel docs.
I know, I should upgrade everything, but that can break a lot of things too...
Thought I'd mention it though in case it's a bug or somebody else has the same 
problem.

It's probably a bug, but your bug report doesn't have enough details
to track it down. Do you have a a.out strace and could send an strace log
with the segfault and the last tens of system calls before it?
-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [KBUILD] Bug in make deb-pkg when using seperate source and object directories

2005-03-14 Thread Ajay Patel
Sam,

I had a similar problem building binrpm-pkg.
Try following patch. It worked for me.

Thanks
Ajay

Index: Makefile
===
RCS file: ./scripts/package/Makefile,v
retrieving revision 1.2
diff -d -c -5 -p -r1.2 Makefile
*** Makefile25 Feb 2005 22:35:22 -  1.2
--- Makefile14 Mar 2005 19:56:06 -
*** clean-files := $(objtree)/kernel.spec
*** 57,67 
  .PHONY: binrpm-pkg
  $(objtree)/binkernel.spec: $(MKSPEC) $(srctree)/Makefile
$(CONFIG_SHELL) $(MKSPEC) prebuilt > $@

  binrpm-pkg: $(objtree)/binkernel.spec
!   $(MAKE)
set -e; \
$(CONFIG_SHELL) $(srctree)/scripts/mkversion > $(objtree)/.tmp_version
set -e; \
mv -f $(objtree)/.tmp_version $(objtree)/.version
[EMAIL PROTECTED] -d $(objtree)/../../pkgdir ] || mkdir -p 
$(objtree)/../../pkgdir
--- 57,67 
  .PHONY: binrpm-pkg
  $(objtree)/binkernel.spec: $(MKSPEC) $(srctree)/Makefile
$(CONFIG_SHELL) $(MKSPEC) prebuilt > $@

  binrpm-pkg: $(objtree)/binkernel.spec
!   $(MAKE) KBUILD_SRC=
set -e; \
$(CONFIG_SHELL) $(srctree)/scripts/mkversion > $(objtree)/.tmp_version
set -e; \
mv -f $(objtree)/.tmp_version $(objtree)/.version
[EMAIL PROTECTED] -d $(objtree)/../../pkgdir ] || mkdir -p 
$(objtree)/../../pkgdir
*** clean-files += $(objtree)/binkernel.spec
*** 74,84 
  # Deb target
  # ---
  #
  .PHONY: deb-pkg
  deb-pkg:
!   $(MAKE)
$(CONFIG_SHELL) $(srctree)/scripts/package/builddeb

  clean-dirs += $(objtree)/debian/


--- 74,84 
  # Deb target
  # ---
  #
  .PHONY: deb-pkg
  deb-pkg:
!   $(MAKE) KBUILD_SRC=
$(CONFIG_SHELL) $(srctree)/scripts/package/builddeb

  clean-dirs += $(objtree)/debian/
  $(MAKE)



On Sun, 13 Mar 2005 01:09:41 -0500, Ryan Anderson <[EMAIL PROTECTED]> wrote:
> Sam,
> 
> When running "make O=something deb-pkg", I get a failure that claims I
> haven't configured my kernel (I have).  Running it a second time tells
> me to run "make mrproper"  (include/linux/version.h got built on the
> first run)
> 
> I did some preliminary poking around, but kbuild is still, well, mostly
> magic to me - I can't see where the object directory is getting lost.
> 
> Think you can take a look?  (Note, this failure shouldn't require
> anything Debian specific on your system to trigger - it's failing, as
> far as I can tell, on the $(MAKE) right before the call build the
> builddeb script, so it should be easy to reproduce)
> 
> The log of when I run it follows:
> 
> [EMAIL PROTECTED] ~/dev/linux/local-quilt$ blocal deb-pkg
> make
> make -C /home/ryan/dev/linux/local-quilt
> O=/home/ryan/dev/linux/output/local
> Makefile:487: .config: No such file or directory
>   Using /home/ryan/dev/linux/local-quilt as source for kernel
>   CHK include/linux/version.h
>   UPD include/linux/version.h
>   SYMLINK include/asm -> include/asm-i386
>   HOSTCC  scripts/basic/fixdep
>   HOSTCC  scripts/basic/split-include
>   HOSTCC  scripts/basic/docproc
>   SHIPPED scripts/kconfig/zconf.tab.h
>   SHIPPED scripts/kconfig/zconf.tab.c
>   SHIPPED scripts/kconfig/lex.zconf.c
>   HOSTCC  scripts/kconfig/conf.o
>   HOSTCC  scripts/kconfig/mconf.o
>   HOSTCC  scripts/kconfig/zconf.tab.o
>   HOSTLD  scripts/kconfig/conf
> scripts/kconfig/conf -s arch/i386/Kconfig
> ***
> *** You have not yet configured your kernel!
> ***
> *** Please run some configurator (e.g. "make oldconfig" or
> *** "make menuconfig" or "make xconfig").
> ***
> make[6]: *** [silentoldconfig] Error 1
> make[5]: *** [silentoldconfig] Error 2
> make[4]: *** [include/linux/autoconf.h] Error 2
> make[3]: *** [all] Error 2
> make[2]: *** [deb-pkg] Error 2
> make[1]: *** [deb-pkg] Error 2
> make: *** [deb-pkg] Error 2
> 
> "blocal" is a simple wrapper to cut down on retyping things, it's just
> this:
> 
> [EMAIL PROTECTED] ~/dev/linux/local-quilt$ cat /home/ryan/bin/blocal
> #!/bin/bash -e
> 
> PWD=`pwd`
> 
> if [ "$PWD" != "/home/ryan/dev/linux/local-quilt" ]; then
> cd /home/ryan/dev/linux/local-quilt
> fi
> 
> make O=../output/local/ -j4 CC="ccache distcc" $*
> 
> --
> 
> Ryan Anderson
>   sometimes Pug Majere
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-14 Thread Andrew Morton
Phillip Lougher <[EMAIL PROTECTED]> wrote:
>
> [ on-disk bitfields ]
> 
> I've checked compatibilty against Intel 32 and 64 bit architectures, 
>  PPC 32/64 bit, ARM, MIPS
>  and SPARC.  I've used compilers from 2.91.x upto 3.4...

hm, OK.  I remain a bit skeptical but it sounds like you're the expert.  I
guess if things later explode it will be pretty obvious, and the filesystem
will need rework.

One thing which I assume we don't know at this stage is whether all 27
architectures work as expected - you can bet ia64 does it differently ;)

How does one test that?  Create a filesystem-in-a-file via mksquashfs, then
transfer that to a different box, then try and mount and use it, I assume?

When you upissue these patches, please include in the changelog pointers to
the relevant userspace support tools - mksquashfs, fsck.squashfs, etc.  I
guess http://squashfs.sourceforge.net/ will suit.

Also, this filesystem seems to do the same thing as cramfs.  We'd need to
understand in some detail what advantages squashfs has over cramfs to
justify merging it.  Again, that is something which is appropriate to the
changelog for patch 1/1.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][RFC] Make /proc/ chmod'able

2005-03-14 Thread Albert Cahalan
On Tue, 2005-03-15 at 00:08 +0100, Bodo Eggert wrote:
> On Mon, 14 Mar 2005, Albert Cahalan wrote:
> > On Mon, 2005-03-14 at 10:42 +0100, Rene Scharfe wrote:
> > > Albert Cahalan wrote:
> 
> > > Why do you think users should not be allowed to chmod their processes' 
> > > /proc directories?  Isn't it similar to being able to chmod their home 
> > > directories?  They own both objects, after all (both conceptually and as 
> > > attributed in the filesystem).
> > 
> > This is, to use your own word, "cloaking". This would let
> > a bad user or even an unauthorized user hide from the admin.
> 
> NACK, the admin (and with the new inherited capabilities all users with 
> cap_???_override) can see all processes. Only users who don't need to know
> won't see the other user's processes.

Capabilities are too broken for most people to use. Normal users
do not get CAP_DAC_OVERRIDE by default anyway, for good reason.

> > Note that the admin hopefully does not normally run as root.
> 
> su1 and sudo exist.

This is a pain. Now every user will need sudo access,
and the sudoers file will have to disable requesting
passwords so that scripts will work without hassle.

> > Even if the admin were not running as a normal user, it is
> > expected that normal users can keep tabs on each other.
> > The admin may be sleeping. Social pressure is important to
> > prevent one user from sucking up all the memory and CPU time.
> 
> Privacy is important, too. Imagine each user can see the CEO (or the
> admin) executing "ee nakedgirl.jpg".

Obviously, he likes to have users see him do this.
He'd use a private machine if he wanted privacy.

> > > > Note: I'm the procps (ps, top, w, etc.) maintainer.
> > > > 
> > > > Probably I'd have to make /bin/ps run setuid root
> > > > to deal with this. (minor changes needed) The same
> > > > goes for /usr/bin/top, which I know is currently
> > > > unsafe and difficult to fix.
> 
> I used unpatched procps 3.1.11, and it worked for me, except pstree.

It does not work correctly.

Look, patches with this "feature" are called rootkits.
Think of the headlines: "Linux now with built-in rootkit".

> > > Why do ps and top need to be setuid root to deal with a resticted /proc? 
> > > What information in /proc/ needs to be available to any and all 
> > > users?
> > 
> > Anything provided by traditional UNIX and BSD systems
> > should be available.
> 
> e.g. the buffer overflow in sendmail? Or all the open relays? :)
> 
> The demands to security and privacy have increased. Linux should be able 
> to provide the requested privacy.

This really isn't about security. Privacy may be undesirable.
With privacy comes anti-social behavior. Supposing that the
users do get privacy, perhaps because the have paid for it:

Xen, UML, VM, VMware, separate computers

Going with separate computers is best. Don't forget to use
network traffic control to keep users from being able to
detect the network activity of other users.

> > Users who want privacy can get their
> > own computer. So, these need to work:
> > 
> > ps -ef
> > ps -el
> > ps -ej
> > ps axu
> > ps axl
> > ps axj
> > ps axv
> > w
> > top
> 
> Works as intended. Only pstree breaks, if init isn't visible.

They work like they do with a rootkit installed.
Traditional behavior has been broken.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

2005-03-14 Thread Albert Cahalan
On Mon, 2005-03-14 at 12:27 -0800, Matt Mackall wrote:
> On Mon, Mar 14, 2005 at 12:04:07PM -0800, john stultz wrote:
> > > > > > > > +static inline cycle_t read_timesource(struct timesource_t* ts)
> > > > > > > > +{
> > > > > > > > +   switch (ts->type) {
> > > > > > > > +   case TIMESOURCE_MMIO_32:
> > > > > > > > +   return (cycle_t)readl(ts->mmio_ptr);
> > > > > > > > +   case TIMESOURCE_MMIO_64:
> > > > > > > > +   return (cycle_t)readq(ts->mmio_ptr);
> > > > > > > > +   case TIMESOURCE_CYCLES:
> > > > > > > > +   return (cycle_t)get_cycles();
> > > > > > > > +   default:/* case: TIMESOURCE_FUNCTION */
> > > > > > > > +   return ts->read_fnct();
> > > > > > > > +   }
> > > > > > > > +}
> > > Well where we'd read an MMIO address, we'd simply set read_fnct to
> > > generic_timesource_mmio32 or so. And that function just does the read.
> > > So both that function and read_timesource become one-liners and we
> > > drop the conditional branches in the switch.
> > 
> > However the vsyscall/fsyscall bits cannot call in-kernel functions (as
> > they execute in userspace or a sudo-userspace). As it stands now in my
> > design TIMESOURCE_FUNCTION timesources will not be usable for
> > vsyscall/fsyscall implementations, so I'm not sure if that's doable.
> > 
> > I'd be interested you've got a way around that.
> 
> We can either stick all the generic mmio timer functions in the
> vsyscall page (they're tiny) or leave the vsyscall using type/ptr but
> have the kernel internally use only the function pointer. Someone
> who's more familiar with the vsyscall timer code should chime in here.

When the vsyscall page is created, copy the one needed function
into it. The kernel is already self-modifying in many places; this
is nothing new.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][1/2] SquashFS

2005-03-14 Thread Matt Mackall
On Tue, Mar 15, 2005 at 12:47:23PM +1100, Nick Piggin wrote:
> Matt Mackall wrote:
> 
> >>+   for (;;) {
> >
> >while (1)
> 
> I always thought for (;;) was preferred. Or at least acceptable?

The for (;;) form has always struck me as needlessly clever and I've
known it to puzzle coworkers. I try to make my for loops fall into the
mold of simple initialize/test/advance. But no, I'm not aware of any
LKML concensus opinion on this particular point.

The assignment-in-if problem is a bit more serious as it exacerbates
the jammed-up-against-the-right-margin formatting issues.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] sparsemem intro patches

2005-03-14 Thread Andrew Morton
Dave Hansen <[EMAIL PROTECTED]> wrote:
>
>  The following four patches provide the last needed changes before the
>  introduction of sparsemem.  For a more complete description of what this
>  will do, please see this patch:
> 
>  
> http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch

I don't know what to think about this.  Can you describe sparsemem a little
further, differentiate it from discontigmem and tell us why we want one? 
Is it for memory hotplug?  If so, how does it support hotplug?

To which architectures is this useful, and what is the attitude of the
relevant maintenance teams?

Quoting from the above patch:

> Sparsemem replaces DISCONTIGMEM when enabled, and it is hoped that
> it can eventually become a complete replacement.
> ...
> This patch introduces CONFIG_FLATMEM.  It is used in almost all
> cases where there used to be an #ifndef DISCONTIG, because
> SPARSEMEM and DISCONTIGMEM often have to compile out the same areas
> of code.

Would I be right to worry about increasing complexity, decreased
maintainability and generally increasing mayhem?

If a competent kernel developer who is not familiar with how all this code
hangs together wishes to acquaint himself with it, what steps should he
take?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-14 Thread Phillip Lougher
On Tuesday, March 15, 2005, at 01:06  am, Andrew Morton wrote:
Phillip Lougher <[EMAIL PROTECTED]> wrote:
@@ -0,0 +1,439 @@
[lots of comments from patch 1/2 are applicable here]
OK.  Noted :-)
+#define SQUASHFS_MAX_FILE_SIZE ((long long) 1 << \
+   (SQUASHFS_MAX_FILE_SIZE_LOG - 1))
1LL would suit here.  Of a cast to loff_t.
OK
+typedef unsigned int   squashfs_block;
+typedef long long  squashfs_inode;
squashfs_block_t and squashfs_inode_t, please.  If one must use 
typedefs...

OK
+typedef struct squashfs_super_block {
+   unsigned ints_magic;
+   unsigned intinodes;
+   unsigned intbytes_used;
+   unsigned intuid_start;
+   unsigned intguid_start;
+   unsigned intinode_table_start;
+   unsigned intdirectory_table_start;
+   unsigned ints_major:16;
+   unsigned ints_minor:16;
+   unsigned intblock_size_1:16;
+   unsigned intblock_log:16;
+   unsigned intflags:8;
+   unsigned intno_uids:8;
+   unsigned intno_guids:8;
+   unsigned intmkfs_time /* time of filesystem creation */;
+   squashfs_inode  root_inode;
+   unsigned intblock_size;
+   unsigned intfragments;
+   unsigned intfragment_table_start;
+} __attribute__ ((packed)) squashfs_super_block;
Whoa.  Tons of bitfields in this file.  Are these on-disk data 
structures?
If so, that's a problem for portability between architectures and 
possibly
compiler versions.  It also introduces locking complexity.

if they're in-core data structures then the bitfields are probably 
slower than using `int', as well.

They look pretty nasty, but are quite harmless really...
The structures represent on-disk structures.  Squashfs tries to cram as 
much information into
 as small an area as possible on disk, which is why they're using 
bitfields.

The structures are read into memory from disk into the bit field 
structure, and the information
is immediately transferred to more sane 'int' structures inside the 
inode or into private
Squashfs data, and all reads/writes take place from there.  No writes 
are made into the
bit fields, they're only used to temporarily 'parse' the packed data on 
disk.

I've done a lot of checking to ensure portability across architectures 
and against different
compiler versions.  Gcc uniformly uses two representations for 'packed 
structures', one for
little endian architectures and one for big endian architectures.  
Little endian bitfield
structures are packed low-byte high byte order, allocating bitfields 
from low bit to high bit in ints.
Big endian structures are packed high-byte low-byte order, allocating 
bitfields from
high bit to low bit in ints (this incidently generates structures in 
the bit/byte order
specified in the C source).  The filling is done this way on different 
endian architectures
as it allows the most efficient bit-field access code to be generated 
for each endian
architecture.

I've checked compatibilty against Intel 32 and 64 bit architectures, 
PPC 32/64 bit, ARM, MIPS
and SPARC.  I've used compilers from 2.91.x upto 3.4...

+typedef struct {
+   unsigned intinode_type:4;
+   unsigned intmode:12; /* protection */
+   unsigned intuid:8; /* index into uid table */
+   unsigned intguid:8; /* index into guid table */
+} __attribute__ ((packed)) squashfs_base_inode_header;
See, if one CUP is modifying `inode_type' while another CPU is 
modifying
`mode', this struct can get trashed.
I agree.  This is why the structures are never written to.  Bit fields 
are slow, I move
the data out as soon as possible.


+/*
+ * macros to convert each packed bitfield structure from little 
endian to big
+ * endian and vice versa.  These are needed when creating or using a 
filesystem
+ * on a machine with different byte ordering to the target 
architecture.
+ *
+ */
hmm, OK..  Tell us more?
As mentioned previously, there are two packed bit-field 
representations, one
for big endian machines, and one for little endian machines.  Squashfs 
for
efficiency in embedded systems writes little endian filesystems (with 
little
endian bit field structures) for little endian targets, and big endian 
filesystems
for big endian targets.  However, to allow non-native endian filesystems
(i.e. where the host is little endian but the target is big endian), to 
be mounted,
Squashfs will swap the filesystem on a different endian machine.

Squashfs at filesystem mount time determines if the filesystem is 
swapped with
respect to the host architecture.  If it is then the packed bit-field 
structures
read off disk are in the wrong endianness.  Immediately after reading 
off disk,
the structures are converted to the correct endianness 

Re: [swsusp/ppc] Re: What's going on here ?

2005-03-14 Thread Benjamin Herrenschmidt

> rjw and hugang did (pretty neccessary) changes to base swsusp (pagedir
> table -> pagedir linklist), that unfortunately needed update to all
> the assembly parts. It was series 1/3 update core, i386 and x86-64,
> 2/3 update ppc, 3/3 introduce initramfs.
> 
> This is the offending patch I believe (but the version that was merged
> was From: me, without code changes).
> 
> I realized that patch does more than changing from table to linklist,
> but it looked mostly okay, so I forwarded it. Sorry.

It does more than that ... it _adds_ swsusp to ppc ! swsusp wasn't in
mainline at all for ppc because I consider it not ready. And even the
asm change should go through me anyway since i wrote that code and I'm
not sure they know all the possible "issues" with that code.

> So, what to do now?
> 
> a) just revert it
> 
> or
> 
> b) revert pmac_setup.c and via-pmu parts and Kconfig part
> 
> or
> 
> c) just disable Kconfig part and fix it up with incremental patches

I'll decide later today. I may well keep it and do the cleanup I had in
mind on top of this, which means merging the pmac suspend-to-ram with
the common infrastructure. But that will need some changes & hooks to
the core swsusp.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: OHCI driver dies on Mac Mini at boot

2005-03-14 Thread Ben Collins
Send me the lspci -v output for this machine.

On Tue, Mar 15, 2005 at 11:25:36AM +1100, Benjamin Herrenschmidt wrote:
> I get this output with current Linus bk : 
> 
> [0.00] Total memory = 512MB; using 1024kB for hash table (at 8040)
> [0.00] Linux version 2.6.11-gack ([EMAIL PROTECTED]) (gcc version 
> 3.3.5 (Debian 1:3.3.5-8)) #5 Tue Mar 15 11:20:41 EST 2005
> [0.00] Found UniNorth memory controller & host bridge, revision: 210
> [0.00] Mapped at 0xfdd68000
> [0.00] Found a Intrepid mac-io controller, rev: 0, mapped at 
> 0xfdce8000
> [0.00] Processor NAP mode on idle enabled.
> [0.00] PowerMac motherboard: Mac mini
> [0.00] Found UniNorth PCI host bridge at 0xf000. Firmware bus 
> number: 0->0
> [0.00] Found UniNorth PCI host bridge at 0xf200. Firmware bus 
> number: 0->0
> [0.00] Found UniNorth PCI host bridge at 0xf400. Firmware bus 
> number: 0->0
> [0.00] via-pmu: Server Mode is disabled
> [0.00] PMU driver 2 initialized for Core99, firmware: 55
> [0.00] nvram: Checking bank 0...
> [0.00] nvram: gen0=122, gen1=121
> [0.00] nvram: Active bank is: 0
> [0.00] nvram: OF partition at 0x410
> [0.00] nvram: XP partition at 0x1020
> [0.00] nvram: NR partition at 0x1120
> [0.00] On node 0 totalpages: 131072
> [0.00]   DMA zone: 131072 pages, LIFO batch:16
> [0.00]   Normal zone: 0 pages, LIFO batch:1
> [0.00]   HighMem zone: 0 pages, LIFO batch:1
> [0.00] Built 1 zonelists
> [0.00] Kernel command line: s
> [0.00] PowerMac using OpenPIC irq controller at 0x8004
> [0.00] OpenPIC Version 1.2 (4 CPUs and 64 IRQ sources) at fc496000
> [0.00] OpenPIC timer frequency is 4.16 MHz
> [0.00] PID hash table entries: 4096 (order: 12, 65536 bytes)
> [0.00] GMT Delta read from XPRAM: 0 minutes, DST: off
> [0.00] time_init: decrementer frequency = 41.620997 MHz
> [   69.359998] Console: colour dummy device 80x25
> [   69.360691] Dentry cache hash table entries: 131072 (order: 7, 524288 
> bytes)
> [   69.361601] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> [   69.380943] Memory: 512896k available (2580k kernel code, 1336k data, 176k 
> init, 0k highmem)
> [   69.380964] System.map loaded at 0x811bd000 for debugger, size: 963019 
> bytes
> [   69.380972] AGP special page: 0x9000
> [   69.381051] Calibrating delay loop... 1413.12 BogoMIPS (lpj=706560)
> [   69.402356] Mount-cache hash table entries: 512
> [   69.404165] NET: Registered protocol family 16
> [   69.405106] PCI: Probing PCI hardware
> [   69.406287] PCI: Cannot allocate resource region 0 of device 0001:10:18.0
> [   69.406299] PCI: Cannot allocate resource region 0 of device 0001:10:19.0
> [   69.406322] Apple USB OHCI 0001:10:18.0 disabled by firmware
> [   69.406331] Apple USB OHCI 0001:10:19.0 disabled by firmware
> [   69.406390] Registering openpic with sysfs...
> [   69.407120] SCSI subsystem initialized
> [   69.407258] usbcore: registered new driver usbfs
> [   69.407289] usbcore: registered new driver hub
> [   69.408749] Installing knfsd (copyright (C) 1996 [EMAIL PROTECTED]).
> [   69.409293] PCI: Enabling device :00:10.0 (0006 -> 0007)
> [   69.925273] radeonfb (:00:10.0): Invalid ROM signature 0 should 
> be0xaa55
> [   69.925294] radeonfb: Retreived PLL infos from Open Firmware
> [   69.925303] radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=190.00 Mhz, 
> System=250.00 MHz
> [   69.925314] radeonfb: PLL min 12000 max 35000
> [   70.517246] radeonfb: Monitor 1 type DFP found
> [   70.517263] radeonfb: EDID probed
> [   70.517269] radeonfb: Monitor 2 type DFP found
> [   70.517275] radeonfb: EDID probed
> [   70.611713] Console: switching to colour frame buffer device 240x75
> [   70.611913] radeonfb (:00:10.0): ATI Radeon Yb 
> [   70.626083] Generic RTC Driver v1.07
> [   70.626274] Macintosh non-volatile memory driver v1.1
> [   70.626540] io scheduler noop registered
> [   70.626702] io scheduler anticipatory registered
> [   70.626861] io scheduler deadline registered
> [   70.627015] io scheduler cfq registered
> [   70.627552] RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 
> blocksize
> [   70.627979] loop: loaded (max 8 devices)
> [   70.628137] sungem.c:v0.98 8/24/03 David S. Miller (davem@redhat.com)
> [   70.690585] PHY ID: 4061e4, addr: 0
> [   70.691210] eth0: Sun GEM (PCI) 10/100/1000BaseT Ethernet 
> 00:11:24:76:c8:ea 
> [   70.691524] eth0: Found BCM5221 PHY
> [   70.691694] pcnet32.c:v1.30i 06.28.2004 [EMAIL PROTECTED]
> [   70.691954] PPP generic driver version 2.4.2
> [   70.692139] PPP Deflate Compression module registered
> [   70.692337] MacIO PCI driver attached to Intrepid chipset
> [   70.692956] input: Macintosh mouse button emulation
> [   70.693211] apm_emu: APM Emulation 0.5 initialized.
> [   70.693421] 

Re: [PATCH] 2.6.11-mm3 patch for ext3 writeback "nobh" option

2005-03-14 Thread Andrew Morton
Badari Pulavarty <[EMAIL PROTECTED]> wrote:
>
> Here is the 2.6.11-mm3 version of patch for adding "nobh"
>  support for ext3 writeback mode.

Care to update Documentation/filesystems/ext3.txt?

>  Can you include it in -mm ?

Spose so.

Did you have performance and resource consumption numbers to justify it?  I
think I asked that before and promptly forgot the answer, which is a good
reason for taking some care over changelog maintenance...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ES7000 Legacy Mappings Update

2005-03-14 Thread Andrew Morton

You triggered my trivia twitch.

Jason Davis <[EMAIL PROTECTED]> wrote:
>
>  - * ES7000 has no legacy identity mappings
>  + * Older generations of ES7000 have no legacy identity mappings
>*/
>  -if (es7000_plat)
>  +if (es7000_plat && es7000_plat < 2) 
>   return;

Why not

if (es7000_plat == 1)

?

>   /* 
>  diff -Naurp linux-2.6.11.3/arch/i386/mach-es7000/es7000plat.c 
> linux-2.6.11.3-legacy/arch/i386/mach-es7000/es7000plat.c
>  --- linux-2.6.11.3/arch/i386/mach-es7000/es7000plat.c2005-03-13 
> 01:44:41.0 -0500
>  +++ linux-2.6.11.3-legacy/arch/i386/mach-es7000/es7000plat.c 2005-03-14 
> 11:52:44.0 -0500
>  @@ -138,7 +138,14 @@ parse_unisys_oem (char *oemptr, int oem_
>   es7000_plat = 0;
>   } else {
>   printk("\nEnabling ES7000 specific features...\n");
>  -es7000_plat = 1;
>  +/*
>  + * Check to see if this is a x86_64 ES7000 machine.
>  + */
>  +if (!(boot_cpu_data.x86 <= 15 && boot_cpu_data.x86_model <= 2))
>  +es7000_plat = 2;
>  +else
>  +es7000_plat = 1;
>  +

Perhaps some nice enumerated identifiers here, rather than magic numbers?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] oom_kill fix

2005-03-14 Thread Andrew Morton
Coywolf Qi Hunt <[EMAIL PROTECTED]> wrote:
>
>  This oom_kill fix is to do mmput(mm) a bit earlier and returning 0 or 1
>  to indicate success or failure instead of returning mm_struct pointer. 

Why is this a "fix"?  What bug is it fixing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Automatically append a semi-random version for BK users

2005-03-14 Thread Ryan Anderson
Automatically append a semi-random version if the tree we're building
isn't tagged in BitKeeper (or another SCM) and CONFIG_LOCALVERSION_AUTO
is set.

This fixes the case when Linus (or someone else) does a release and tags
it, someone else does a build of that release tree (i.e, 2.6.11), and
installs it.  Later, before another release occurs (i.e, -rc1), another
build happens, and the actual, released 2.6.11 is overwritten with the
-current tree.

This currently supports BitKeeper only, but support for other SCMs is
easy to add.

Signed-Off-By: Ryan Anderson <[EMAIL PROTECTED]>


Index: local-quilt/Makefile
===
--- local-quilt.orig/Makefile   2005-03-14 19:17:41.0 -0500
+++ local-quilt/Makefile2005-03-14 20:45:11.0 -0500
@@ -549,6 +549,26 @@ export KBUILD_IMAGE ?= vmlinux
 # images. Default is /boot, but you can set it to other values
 export INSTALL_PATH ?= /boot
 
+# If CONFIG_LOCALVERSION_AUTO is set, we automatically perform some tests
+# and try to determine if the current source tree is a release tree, of any 
sort,
+# or if is a pure development tree.
+#
+# A 'release tree' is any tree with a BitKeeper, or other SCM, TAG associated
+# with it.  The primary goal of this is to make it safe for a native
+# BitKeeper/CVS/SVN user to build a release tree (i.e, 2.6.9) and also to
+# continue developing against the current Linus tree, without having the Linus
+# tree overwrite the 2.6.9 tree when installed.
+#
+# Currently, only BitKeeper is supported.
+# Other SCMs can edit scripts/setlocalversion and add the appropriate
+# checks as needed.
+
+
+ifdef CONFIG_LOCALVERSION_AUTO
+   localversion-auto := $(shell $(PERL) $(srctree)/scripts/setlocalversion 
$(srctree))
+   LOCALVERSION := $(LOCALVERSION)$(localversion-auto)
+endif
+
 #
 # INSTALL_MOD_PATH specifies a prefix to MODLIB for module directory
 # relocations required by build roots.  This is not defined in the
Index: local-quilt/init/Kconfig
===
--- local-quilt.orig/init/Kconfig   2005-03-14 19:17:41.0 -0500
+++ local-quilt/init/Kconfig2005-03-14 20:49:45.0 -0500
@@ -69,6 +69,21 @@ config LOCALVERSION
  object and source tree, in that order.  Your total string can
  be a maximum of 64 characters.
 
+config LOCALVERSION_AUTO
+   bool "Automatically append version information to the version string"
+   default y
+   help
+ This will try to automatically determine if the current tree is a
+ release tree by looking for BitKeeper, or other SCM tags that
+ belong to the current top of tree revision.
+
+ A string of the format -BK will be added to the
+ localversion.  The string generated by this will be appended 
+ after any matching localversion* files, and after the 
+ value set in CONFIG_LOCALVERSION
+ Note: This requires Perl and the Digest::MD5 module, as well
+ as BitKeeper.
+
 config SWAP
bool "Support for paging of anonymous memory (swap)"
depends on MMU
Index: local-quilt/scripts/setlocalversion
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ local-quilt/scripts/setlocalversion 2005-03-14 20:41:01.0 -0500
@@ -0,0 +1,85 @@
+#!/usr/bin/perl
+# Copyright 2004 - Ryan Anderson <[EMAIL PROTECTED]>  GPL v2
+
+use strict;
+use warnings;
+use Digest::MD5;
+require 5.006;
+
+if (@ARGV != 1) {
+   print <
+EOT
+   exit(1);
+}
+
+my ($srctree) = @ARGV;
+
+my @LOCALVERSIONS = ();
+
+# BitKeeper Version Checks
+
+# We are going to use the following commands to try and determine if this
+# repository is at a Version boundary (i.e, 2.6.10 vs 2.6.10 + some patches) We
+# currently assume that all meaningful version boundaries are marked by a tag.
+# We don't care what the tag is, just that something exists.
+#
+# The process is as follows:
+#
+# 1. Get the key of the top of tree changeset:
+#  cset=`bk changes -r+ -k`
+#This will be something like:
+#[EMAIL PROTECTED]|ChangeSet|20050314010036|43252
+#
+# 2. Get the tag, if any, associated with it:
+#   bk prs -h -d':TAG:\n' -r$cset
+#
+# 3. If no such tag exists, take the hex-encoded md5sum of the
+# changeset key, extract the first 8 characters of it, and add
+# -BK and the above 8 characters to the end of the version.
+
+sub do_bk_checks {
+   chdir($srctree);
+   my $changeset = `bk changes -r+ -k`;
+   chomp $changeset; # strip trailing \n safely
+   my $tag = `bk prs -h -d':TAG:' -r'$changeset'`;
+
+   if (length($tag) == 0) {
+   # There is no tag at the Top of Tree changeset, so this is not
+   # a release tree.  To distinguish this from the previous
+   # release tree, something must be appended to the version to
+   # 

Re: [PATCH][1/2] SquashFS

2005-03-14 Thread Nick Piggin
Matt Mackall wrote:
+   for (;;) {
while (1)

I always thought for (;;) was preferred. Or at least acceptable?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Automatically append a semi-random version for BK users

2005-03-14 Thread Ryan Anderson
Snipping a bit as I go, thanks for the feedback, Sam.

On Mon, Mar 14, 2005 at 11:43:17PM +0100, Sam Ravnborg wrote:
> On Wed, Mar 09, 2005 at 03:06:38AM -0500, Ryan Anderson wrote:
> > Two approachs are present here, a Perl version that is setup to handle
> > other automatic version appends (i.e, a CVS version shouldn't be much
> > effort to add), and a simplistic shell version that depends on "md5sum".
> > Both approaches generate the same hash.
> 
> Please skip the shell version - add a note in Kconfig that enabling this
> option requires perl.

Thanks.  That makes this much easier to do.

> >  #exportINSTALL_PATH=/boot
> >  
> > +# If CONFIG_LOCALVERSION_AUTO is set, we automatically perform some tests
> > +# and try to determine if the current source tree is a release tree, of 
> > any sort,
> > +# or if is a pure development tree.
> > +# A 'release tree' is any tree with a BitKeeper TAG associated with it.
> > +# The primary goal of this is to make it safe for a native BitKeeper user 
> > to
> > +# build a release tree (i.e, 2.6.9) and also to continue developing 
> > against the
> > +# current Linus tree, without having the Linus tree overwrite the 2.6.9 
> > tree 
> > +# when installed.
> > +#
> > +# (In the future, CVS and SVN support will be added as well.)
> > +
> > +ifeq ($(CONFIG_LOCALVERSION_AUTO),y)
> > +   ifeq ($(shell ls -d $(srctree)/BitKeeper 
> > 2>/dev/null),$(srctree)/BitKeeper)
> > +   localversion-bk := $(shell 
> > $(srctree)/scripts/setlocalversion.sh $(srctree) $(objtree))
> > +   LOCALVERSION := $(LOCALVERSION)$(localversion-bk)
> > +   endif
> > +endif
> Move the logic to determine the SCM system into the perl script.
> And do not assume bk, select more generic names.

Ok, simple enough to do.

> Also use:
> ifdef CONFIG_LOCALVERSION_AUTO
> like in rest of Makefile.
> 
> Something like this:
> ifdef CONFIG_LOCALVERSION_AUTO
> LOCALVERSION += $(CONFIG_SHELL) $(srctree)/scripts/setlocalversion.sh 
> $(srctree)
> endif
> note - perl script does not use objtree.

I forget why I had that there - I'll remove it.

> diff -Nru a/scripts/setlocalversion b/scripts/setlocalversion
> > --- /dev/null   Wed Dec 31 16:00:00 196900
> > +++ b/scripts/setlocalversion   2005-03-09 02:51:15 -05:00

> > +
> > +# We are going to use the following commands to try and determine if
> > +# this repository is at a Version boundary (i.e, 2.6.10 vs 2.6.10 + some 
> > patches)
> > +# We currently assume that all meaningful version boundaries are marked by 
> > a tag.
> > +# We don't care what the tag is, just that something exists.
> > +
> > [EMAIL PROTECTED] ~/dev/linux/local$ T=`bk changes -r+ -k`
> > [EMAIL PROTECTED] ~/dev/linux/local$ bk prs -h -d':TAG:\n' -r$T
> - to be deleted?

I'll simply rewrite as a better explanation of what's going on, for the
poor Perl neophytes. :)

> > +
> > +sub do_bk_checks {
> > +   chdir($srctree);
> > +   my $changeset = `bk changes -r+ -k`;
> > +   chomp $changeset;
> > +   my $tag = `bk prs -h -d':TAG:' -r'$changeset'`;
> > +
> > +   printf("ChangeSet Key = '%s'\nTAG = '%s'\n", $changeset, $tag) if 
> > ($debug > 0);
> > +
> > +   if (length($tag) == 0) {
> if this imply that this is not a release tree then please write so - to
> be consistent with comment in top-level makefile.
> Thats good for poor sould like me that does not know perl.

Ok, I think I've made what's going on much more apparent in my revised
version.

> > +   # We do not have a tag at the Top of Tree, so we need to 
> > generate a localversion file
> > +   # We'll use the given $changeset as input into this.
> > +   my $localversion = Digest::MD5::md5_hex($changeset);
> > +   $localversion = substr($localversion,0,8);
> > +
> > +   printf("localversion = '%s'\n",$localversion) if ($debug > 0);
> > +
> > +   push @LOCALVERSIONS, "BK" . $localversion;
> > +
> > +   }
> > +}
> > +
> > +
> > +if ( -d "BitKeeper" ) {
> > +   my $bk = `which bk`;
> > +   chomp $bk;
> > +   if (length($bk) != 0) {
> > +   do_bk_checks();
> > +   }
> > +}
> And this part should be prepared for cvs+svn.

I was going to implement the CVS version tonight, but I can't seem to
find the bk2cvs tree (Google is failing).

I'll send an incremental patch when I do track it down, for now, I'll
reply to this with my new version.

Again, thanks for the feedback.

-- 

Ryan Anderson
  sometimes Pug Majere
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch trivial] as-iosched fix path to Documentation

2005-03-14 Thread Adrian Bunk
On Thu, Mar 10, 2005 at 12:42:23AM +0100, maximilian attems wrote:

> From: Klaus Ita <[EMAIL PROTECTED]>
> 
> subject says all, patch still applies.
>...

Fix is already in -mm for some time.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fix irq_affinity write from /proc for IPF

2005-03-14 Thread Andrew Morton
Ashok Raj <[EMAIL PROTECTED]> wrote:
>
> > Is it not possible for ia64's ->set_affinity() handler to do this deferring?
> > 
> 
> There are other places where we re-program, and its fine to call the 
> current version of set_affinity directly, like when we are doing cpu offline
> and trying to force migrate irqs for ia64.
> 
> Changing the default set_affinity() for ia64 would result in many changes, 
> this still keeps the same purpose of those access functions, and 
> differentiates the proc write cases alone without changing the meaning 
> of those handler functions. (and a smaller patch)
> 
> this would further complicate the force migrate irq's when we consider 
> MSI interrupts as well. Since it would have its own set_affinity, and we need
> to hack into MSI's set affinity handler as well which would complicate things.

OK, just checking.

I'll include this change in the next batch, probably post-2.6.12-rc1, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[swsusp/ppc] Re: What's going on here ?

2005-03-14 Thread Pavel Machek
Hi!

> Hi just see that the whole stack of pmac SWSUSP patches went in, without
> any notice nor CC nor anything from any of the PPC maintainers ! That is
> a bit annoying don't you think ?
> 
> Paulus and I wrote most of those patches, granted, and they've been
> hanging around for some time, but I had good reasons not to submit them
> in their current state.
> 
> And regardless, I'm pretty pissed off by the fact that such invasive
> changes to the architecture and the platform support were submitted and
> merged without any notice nor ack from any of the arch or platform
> maintainers (basically paulus and me).

Sorry, that's probably my fault.

rjw and hugang did (pretty neccessary) changes to base swsusp (pagedir
table -> pagedir linklist), that unfortunately needed update to all
the assembly parts. It was series 1/3 update core, i386 and x86-64,
2/3 update ppc, 3/3 introduce initramfs.

This is the offending patch I believe (but the version that was merged
was From: me, without code changes).

I realized that patch does more than changing from table to linklist,
but it looked mostly okay, so I forwarded it. Sorry.

So, what to do now?

a) just revert it

or

b) revert pmac_setup.c and via-pmu parts and Kconfig part

or

c) just disable Kconfig part and fix it up with incremental patches

?
Pavel


From: "Rafael J. Wysocki" <[EMAIL PROTECTED]>
To: Andrew Morton <[EMAIL PROTECTED]>
Subject: [PATCH][3/3] swsusp: use non-contiguous memory
Cc: Hu Gang <[EMAIL PROTECTED]>,
LKML , Pavel Machek <[EMAIL PROTECTED]>
Lines: 686

From: Hu Gang <[EMAIL PROTECTED]>

Subject: swsusp: use non-contiguous memory on resume - ppc support

This patch contains the architecture-dependent changes for ppc
required for using a linklist instead of an array of page backup entries
during resume.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>

diff -Nru linux-2.6.11-a/arch/ppc/Kconfig linux-2.6.11-b/arch/ppc/Kconfig
--- linux-2.6.11-a/arch/ppc/Kconfig 2005-03-02 08:38:33.0 +0100
+++ linux-2.6.11-b/arch/ppc/Kconfig 2005-03-04 18:42:16.0 +0100
@@ -1046,6 +1046,8 @@
 
 source "drivers/zorro/Kconfig"
 
+source kernel/power/Kconfig
+
 endmenu
 
 menu "Bus options"
diff -Nru linux-2.6.11-a/arch/ppc/kernel/asm-offsets.c 
linux-2.6.11-b/arch/ppc/kernel/asm-offsets.c
--- linux-2.6.11-a/arch/ppc/kernel/asm-offsets.c2005-03-02 
08:38:09.0 +0100
+++ linux-2.6.11-b/arch/ppc/kernel/asm-offsets.c2005-03-04 
18:42:16.0 +0100
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -136,6 +137,10 @@
DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
 
+   DEFINE(pbe_address, offsetof(struct pbe, address));
+   DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address));
+   DEFINE(pbe_next, offsetof(struct pbe, next));
+
DEFINE(NUM_USER_SEGMENTS, TASK_SIZE>>28);
return 0;
 }
diff -Nru linux-2.6.11-a/arch/ppc/kernel/Makefile 
linux-2.6.11-b/arch/ppc/kernel/Makefile
--- linux-2.6.11-a/arch/ppc/kernel/Makefile 2005-03-02 08:38:25.0 
+0100
+++ linux-2.6.11-b/arch/ppc/kernel/Makefile 2005-03-04 18:42:16.0 
+0100
@@ -16,6 +16,7 @@
semaphore.o syscalls.o setup.o \
cputable.o ppc_htab.o perfmon.o
 obj-$(CONFIG_6xx)  += l2cr.o cpu_setup_6xx.o
+obj-$(CONFIG_SOFTWARE_SUSPEND) += swsusp.o
 obj-$(CONFIG_POWER4)   += cpu_setup_power4.o
 obj-$(CONFIG_MODULES)  += module.o ppc_ksyms.o
 obj-$(CONFIG_NOT_COHERENT_CACHE)   += dma-mapping.o
diff -Nru linux-2.6.11-a/arch/ppc/kernel/signal.c 
linux-2.6.11-b/arch/ppc/kernel/signal.c
--- linux-2.6.11-a/arch/ppc/kernel/signal.c 2005-03-02 08:38:33.0 
+0100
+++ linux-2.6.11-b/arch/ppc/kernel/signal.c 2005-03-04 18:42:16.0 
+0100
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -704,6 +705,14 @@
unsigned long frame, newsp;
int signr, ret;
 
+   if (current->flags & PF_FREEZE) {
+   refrigerator(PF_FREEZE);
+   signr = 0;
+   ret = regs->gpr[3];
+   if (!signal_pending(current))
+   goto no_signal;
+   }
+
if (!oldset)
oldset = >blocked;
 
@@ -726,6 +735,7 @@
regs->gpr[3] = EINTR;
/* note that the cr0.SO bit is already set */
} else {
+no_signal:
regs->nip -= 4; /* Back up & retry system call */
regs->result = 0;
regs->trap = 0;
diff -Nru linux-2.6.11-a/arch/ppc/kernel/swsusp.S 
linux-2.6.11-b/arch/ppc/kernel/swsusp.S
--- linux-2.6.11-a/arch/ppc/kernel/swsusp.S 1970-01-01 

[topic change] jiffies as a time value

2005-03-14 Thread john stultz
On Mon, 2005-03-14 at 15:40 -0800, George Anzinger wrote:
> john stultz wrote:
> > On Sat, 2005-03-12 at 16:49 -0800, Matt Mackall wrote:
> >>>+  /* finally, update legacy time values */
> >>>+  write_seqlock_irqsave(_lock, x_flags);
> >>>+  xtime = ns2timespec(system_time + wall_time_offset);
> >>>+  wall_to_monotonic = ns2timespec(wall_time_offset);
> >>>+  wall_to_monotonic.tv_sec = -wall_to_monotonic.tv_sec;
> >>>+  wall_to_monotonic.tv_nsec = -wall_to_monotonic.tv_nsec;
> >>>+  /* XXX - should jiffies be updated here? */
> >>
> >>Excellent question. 
> > 
> > Indeed.  Currently jiffies is used as both a interrupt counter and a
> > time unit, and I'm trying make it just the former. If I emulate it then
> > it stops functioning as a interrupt counter, and if I don't then I'll
> > probably break assumptions about jiffies being a time unit. So I'm not
> > sure which is the easiest path to go until all the users of jiffies are
> > audited for intent. 
> 
> Really?  Who counts interrupts???  The timer code treats jiffies as a unit of 
> time.  You will need to rewrite that to make it otherwise.  

Ug. I'm thin on time this week, so I was hoping to save this discussion
for later, but I guess we can get into it now.

Well, assuming timer interrupts actually occur HZ times a second, yes
one could (and current practice, one does) implicitly interpret jiffies
as being a valid notion of time.  However with SMIs, bad drivers that
disable interrupts for too long, and virtualization the reality is that
that assumption doesn't hold. 

We do have the lost-ticks compensation code that tries to help this, but
that conflicts with some virtualization implementations. Suspend/resume
tries to compensate jiffies for ticks missed over time suspended, but
I'm not sure how accurate it really is (additionally, looking at it now,
it assumes jiffies is only 32bits).

Adding to that, the whole jiffies doesn't really increment at HZ, but
ACTHZ confusion, or bad drivers that assume HZ=100, we get a fair amount
of trouble stemming from folks using jiffies as a time value.  Because
in reality, it is just a interrupt counter.

So now, if new timeofday code emulates jiffies, we have to decide if it
emulates jiffies at HZ or ACTHZ? Also there could be issues with jiffies
possibly jittering from it being incremented every tick and then set to
the proper time when the timekeeping code runs. 

I'm not sure which is the best way to go, but it sounds that emulating
it is probably the easiest. I just deferred the question with a comment
until now because its not completely obvious. Any suggestions on the
above questions (I'm guessing the answers are: use ACTHZ, and the jitter
won't hurt that bad). 

> But then you have 
> another problem.  To correctly function, times need to expire on time (hay 
> how 
> bout that) not some time later.  To do this we need an interrupt source.  To 
> this point in time, the jiffies interrupt has been the indication that one or 
> more timer may have expired.  While we don't need to "count" the interrupts, 
> we 
> DO need them to expire the timers AND they need to be on time.

Well, something Nish Aravamudan has been working on is converting the
common users of jiffies (drivers) to start using human time units. These
very well understood units (which avoid HZ/ACTHZ/HZ=100 assumptions) can
then be accurately changed to jiffies (or possibly some other time unit)
internally. It would even be possible for soft-timers to expire based
upon the actual high-res time value, rather then the low-res tick-
counter(which is something else Nish has been playing with). When that
occurs we can easily start doing other interesting things that I believe
you've already been working on in your HRT code, such as changing the
timer interrupt frequency dynamically, or working with multiple timer
interrupt sources. 

So basically, lots of interesting questions and possibilities and I very
much look forward to your input and suggestions. 

thanks
-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][2/2] SquashFS

2005-03-14 Thread Andrew Morton
Phillip Lougher <[EMAIL PROTECTED]> wrote:
>
> 

Please don't send multiple patches with the same Subject:.  Choose nice,
meaningful Subject:s for each patch.  And include the relevant changelog
details within the email for each patch rather than in patch 1/N.  See
http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt and
http://linux.yyz.us/patch-format.html.


> @@ -0,0 +1,439 @@

[lots of comments from patch 1/2 are applicable here]

> +#define SQUASHFS_MAX_FILE_SIZE   ((long long) 1 << \
> + (SQUASHFS_MAX_FILE_SIZE_LOG - 1))

1LL would suit here.  Of a cast to loff_t.

> +typedef unsigned int squashfs_block;
> +typedef long longsquashfs_inode;

squashfs_block_t and squashfs_inode_t, please.  If one must use typedefs...

> +typedef struct squashfs_super_block {
> + unsigned ints_magic;
> + unsigned intinodes;
> + unsigned intbytes_used;
> + unsigned intuid_start;
> + unsigned intguid_start;
> + unsigned intinode_table_start;
> + unsigned intdirectory_table_start;
> + unsigned ints_major:16;
> + unsigned ints_minor:16;
> + unsigned intblock_size_1:16;
> + unsigned intblock_log:16;
> + unsigned intflags:8;
> + unsigned intno_uids:8;
> + unsigned intno_guids:8;
> + unsigned intmkfs_time /* time of filesystem creation */;
> + squashfs_inode  root_inode;
> + unsigned intblock_size;
> + unsigned intfragments;
> + unsigned intfragment_table_start;
> +} __attribute__ ((packed)) squashfs_super_block;

Whoa.  Tons of bitfields in this file.  Are these on-disk data structures? 
If so, that's a problem for portability between architectures and possibly
compiler versions.  It also introduces locking complexity.

if they're in-core data structures then the bitfields are probably slower than 
using `int', as well.

> +typedef struct {
> + unsigned intinode_type:4;
> + unsigned intmode:12; /* protection */
> + unsigned intuid:8; /* index into uid table */
> + unsigned intguid:8; /* index into guid table */
> +} __attribute__ ((packed)) squashfs_base_inode_header;

See, if one CUP is modifying `inode_type' while another CPU is modifying
`mode', this struct can get trashed.

> +/*
> + * macros to convert each packed bitfield structure from little endian to big
> + * endian and vice versa.  These are needed when creating or using a 
> filesystem
> + * on a machine with different byte ordering to the target architecture.
> + *
> + */

hmm, OK..  Tell us more?

> + * bitfields and different bitfield placing conventions on differing
> + * architectures
> + */
> +
> +#include 
> +
> +#ifdef __BIG_ENDIAN
> + /* convert from little endian to big endian */
> +#define SQUASHFS_SWAP(value, p, pos, tbits) _SQUASHFS_SWAP(value, p, pos, \
> + tbits, b_pos)
> +#else
> + /* convert from big endian to little endian */ 
> +#define SQUASHFS_SWAP(value, p, pos, tbits) _SQUASHFS_SWAP(value, p, pos, \
> + tbits, 64 - tbits - b_pos)
> +#endif
> +
> +#define _SQUASHFS_SWAP(value, p, pos, tbits, SHIFT) {\
> + int bits;\
> + int b_pos = pos % 8;\
> + unsigned long long val = 0;\
> + unsigned char *s = (unsigned char *)p + (pos / 8);\
> + unsigned char *d = ((unsigned char *) ) + 7;\
> + for(bits = 0; bits < (tbits + b_pos); bits += 8) \
> + *d-- = *s++;\
> + value = (val >> (SHIFT))/* & ((1 << tbits) - 1)*/;\
> +}

Can the standard leXX_to_cpu() helpers not be used here?

> +#include 
> +
> +typedef struct {
> + unsigned intblock;
> + int length;
> + unsigned intnext_index;
> + char*data;
> + } squashfs_cache;

Whitespace inconsistency (column 1 for the closing brace is standard)

--- linux-2.6.11.3/init/do_mounts_rd.c  2005-03-13 06:44:30.0 +
+++ linux-2.6.11.3-squashfs/init/do_mounts_rd.c 2005-03-14 00:53:28.092559728 
+

Your changelog didn't mention that squashfs interacts with the boot
process.  That's the sort of thing which is nice to tell people about.

> +SQUASHFS FILESYSTEM
> +P: Phillip Lougher
> +M: [EMAIL PROTECTED]
> +W: http://squashfs.sourceforge.net
> +L: [EMAIL PROTECTED]
> +S: Maintained
> +

Lots of little comments, but I have no fundamental problems with the
patches as long as the bitfield issue is shown to be a non-issue.

Please respin the patches and unless someone else sees a showstopper I'll
merge them into -mm for further testing and review, thanks.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ 

Re: AGP module removal impossible ?

2005-03-14 Thread Dave Jones
On Tue, Mar 15, 2005 at 12:28:51AM +0100, Brice Goglin wrote:
 > Hi Dave,
 > 
 > I can't remove the AGP chipset module on my boxes.
 > Looks like the AGP chipset driver holds a reference on itself and
 > thus makes removal impossible.
 > 
 > From what I understand, as soon as intel_agp is loaded, agp_intel_probe
 > is called. It gets a reference on intel_agp module through
 > !try_module_get(bridge->driver->owner) in agp_add_bridge.
 > Then this reference can only be released through module_put in
 > agp_remove_bridge which is called agp_intel_remove which is only called
 > when removing the module.
 > 
 > Thus it looks impossible to remove this module at all.
 > And I think the problem occurs with all other AGP chipset drivers.
 > 
 > I hope the reason is not just that module removal support is not important
 > in 2.6 :) It looks strange to implement a module removal routine if we
 > know it can't be used :)

The locking is screwed up and has been for some time.
I've been meaning to take a look at it for a while, but keep finding
more important things to do.  It should be fixed to lock/unlock when
the device is opened, as it was in 2.4

Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gcc4 fix for sn_serial.c

2005-03-14 Thread Adrian Bunk
On Mon, Mar 14, 2005 at 11:32:39AM -0800, Jesse Barnes wrote:
> The sal_console and sal_console_uart structures have a circular relationship 
> since they both initialize member fields to pointers of one another.  The 
> current code forward declares sal_console_uart as extern so that sal_console 
> can take its address, but gcc4 complains about this since the real definition 
> of sal_console_uart is marked 'static'.  This patch just removes the static 
> qualifier from sal_console_uart to avoid the inconsistency.  Does it look ok 
> to you, Pat?
>...
> = drivers/serial/sn_console.c 1.12 vs edited =
> --- 1.12/drivers/serial/sn_console.c  2005-03-07 20:41:31 -08:00
> +++ edited/drivers/serial/sn_console.c2005-03-14 10:57:19 -08:00
> @@ -801,7 +801,7 @@
>  
>  #define SAL_CONSOLE  _console
>  
> -static struct uart_driver sal_console_uart = {
> +struct uart_driver sal_console_uart = {
>   .owner = THIS_MODULE,
>   .driver_name = "sn_console",
>   .dev_name = DEVICE_NAME,

Why can't you solve this without making sal_console_uart global?

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][1/2] SquashFS

2005-03-14 Thread Andrew Morton
Phillip Lougher <[EMAIL PROTECTED]> wrote:
>
> Please apply the following two patches which adds SquashFS to the
> kernel.

> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "squashfs.h"
> +

We normally put aam includes after linux includes:

#include 
#include 

#include 
#Include 

> +
> +DECLARE_MUTEX(read_data_mutex);
> +

Does this need to have global scope?  If so, it needs a less generic name. 
`squashfs_read_data_mutex' would suit.

> +static struct file_system_type squashfs_fs_type = {
> + .owner = THIS_MODULE,
> + .name = "squashfs",
> + .get_sb = squashfs_get_sb,
> + .kill_sb = kill_block_super,
> + .fs_flags = FS_REQUIRES_DEV
> + };
> +

The final brace should go in column 1.

> +
> +static struct buffer_head *get_block_length(struct super_block *s,
> + int *cur_index, int *offset, int *c_byte)
> +{
> + squashfs_sb_info *msBlk = (squashfs_sb_info *)s->s_fs_info;

s_fs_info has type void*.  Hence there is no need to typecast when
assigning pointers to or from it.  In fact it is a little harmful to do so.

Please search both your patches for all occurrences of s_fs_info and remove
the typecasts.  There are many.


> + unsigned short temp;
> + struct buffer_head *bh;
> +
> + if (!(bh = sb_bread(s, *cur_index)))
> + return NULL;
> +
> + if (msBlk->devblksize - *offset == 1) {
> + if (msBlk->swap)
> + ((unsigned char *) )[1] = *((unsigned char *)
> + (bh->b_data + *offset));
> + else
> + ((unsigned char *) )[0] = *((unsigned char *)
> + (bh->b_data + *offset));

All this typecasting looks nasty.  Is there a nicer way?  Perhaps using a
temporary variable?

Is this code endian-safe?

> + if (msBlk->swap)
> + ((unsigned char *) )[0] = *((unsigned char *)
> + bh->b_data); 
> + else
> + ((unsigned char *) )[1] = *((unsigned char *)
> + bh->b_data); 
> + *c_byte = temp;
> + *offset = 1;
> + } else {
> + if (msBlk->swap) {
> + ((unsigned char *) )[1] = *((unsigned char *)
> + (bh->b_data + *offset));
> + ((unsigned char *) )[0] = *((unsigned char *)
> + (bh->b_data + *offset + 1)); 
> + } else {
> + ((unsigned char *) )[0] = *((unsigned char *)
> + (bh->b_data + *offset));
> + ((unsigned char *) )[1] = *((unsigned char *)
> + (bh->b_data + *offset + 1)); 
> + }

Ditto.

> +
> + if (SQUASHFS_CHECK_DATA(msBlk->sBlk.flags)) {
> + if (*offset == msBlk->devblksize) {
> + brelse(bh);
> + if (!(bh = sb_bread(s, ++(*cur_index
> + return NULL;
> + *offset = 0;
> + }
> + if (*((unsigned char *) (bh->b_data + *offset)) !=
> + SQUASHFS_MARKER_BYTE) {
> + ERROR("Metadata block marker corrupt @ %x\n",
> + *cur_index);
> + brelse(bh);
> + return NULL;

Multiple return statements per function are a maintainability problem,
especially if some of them are deep inside that function's logic.  The old
`goto out' is preferred.

(Imagine what would happen if you later wanted to change this function to
kmalloc a bit of temp storage and you don't want it to leak).

> + }
> + (*offset) ++;

whitespace.

> +unsigned int squashfs_read_data(struct super_block *s, char *buffer,
> + unsigned int index, unsigned int length,
> + unsigned int *next_index)
> +{
> + squashfs_sb_info *msBlk = (squashfs_sb_info *)s->s_fs_info;
> + struct buffer_head *bh[((SQUASHFS_FILE_MAX_SIZE - 1) >>
> + msBlk->devblksize_log2) + 2];

Dynamically sized local storage.  Deliberate?  What is the upper bound on
its size?

> +block_release:
> + while (--b >= 0) brelse(bh[b]);

while (--b >= 0)
brelse(bh[b]);

please.

> +
> + if (n == 0) {
> + wait_queue_t wait;
> +
> + init_waitqueue_entry(, current);
> + add_wait_queue(>waitq, );
> + set_current_state(TASK_UNINTERRUPTIBLE);
> + up(>block_cache_mutex);
> +

What's going on here ?

2005-03-14 Thread Benjamin Herrenschmidt
Hi just see that the whole stack of pmac SWSUSP patches went in, without
any notice nor CC nor anything from any of the PPC maintainers ! That is
a bit annoying don't you think ?

Paulus and I wrote most of those patches, granted, and they've been
hanging around for some time, but I had good reasons not to submit them
in their current state.

And regardless, I'm pretty pissed off by the fact that such invasive
changes to the architecture and the platform support were submitted and
merged without any notice nor ack from any of the arch or platform
maintainers (basically paulus and me).

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

2005-03-14 Thread Christoph Lameter
On Mon, 14 Mar 2005, Matt Mackall wrote:
> We can either stick all the generic mmio timer functions in the
> vsyscall page (they're tiny) or leave the vsyscall using type/ptr but
> have the kernel internally use only the function pointer. Someone
> who's more familiar with the vsyscall timer code should chime in here.

No we cannot do any function calls in a fastcall path on ia64. The current
design is ok. Why duplicate the functionality with additional indirect
function calls? Plus an indirect function  calls stalls pipelines on some
processors and will limit the performance of gettimeofday.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] new timeofday core subsystem (v. A3)

2005-03-14 Thread john stultz
On Mon, 2005-03-14 at 16:28 -0800, Christoph Lameter wrote:
> On Mon, 14 Mar 2005, john stultz wrote:
> 
> > Huh. So if I understand you properly, all timesources should have valid
> > read_fnct pointers that return the cycle value, however we'll still
> > preserve the type and mmio_ptr so fsyscall/vsyscall bits can use them
> > externally?
> >
> > Hmm. I'm a little cautious, as I really want to make the vsyscall
> > gettimeofday and regular do_gettimeofday be a similar as possible to
> > avoid some of the bugs we've seen between different gettimeofday
> > implementations. However I'm not completely against the idea.
> >
> > Christoph: Do you have any thoughts on this?
> 
> Sorry to be late to the party. It would be a weird implementation to have
> two ways to obtain time for each timesource. Also would be even more a
> headache to maintain than the existing fastcall vs. fullcall.

That's my feeling as well, unless a more convincing argument comes up.

thanks
-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.11] IBM TrackPoint support

2005-03-14 Thread Dmitry Torokhov
On Mon, 14 Mar 2005 08:40:22 -0500, Stephen Evanchik <[EMAIL PROTECTED]> wrote:
> On Mon, 14 Mar 2005 13:19:56 +0100, Vojtech Pavlik <[EMAIL PROTECTED]> wrote:
> > How much does it interpret the stream in non-transparent mode? Are
> > commands also passed through in soft transparent mode?
> >
> > I'm asking because we might want to implement a passthrough port
> > similarly to what the Synaptics driver does and allow extended protocol
> > mice to be connected to the external mouse port.
> 
> I originally thought that I could implement something similar to the
> Synaptics driver. Unfortunately, while in transparent mode bytes are
> relayed unmodified with the TrackPoint controller disabled. In other
> words, no simultaneous usage.
> 
> That doesn't mean extended protocol mice couldn't be supported in
> transparent mode however. I didn't find it particularly useful given
> the TrackPoint itself would be disabled.
> 

Here is my take on it (now that I have skimmed the TrackPint spec) -
transparent mode is to be used only when querying the external device.
This way trackpoint does not interfere with data stream at all and the
kernel gets a chance to know exactly whta is behing the trackpoint -
Logitech, explorer, something more exotic... Once identification is
done transparent mode should be cancelled.

Bit 3 can be used to de-multiplex 2 streams; hopefully trackpoint is
able to rely packets longer than 3 bytes from the external device.

As far as I can see there is no point of exporting transparent mode to
the userspace via sysfs. I also do not think that we need to export
middle_button_disable as it is "..for compatibility with older
software expecting this bit be always 0" and we do not have such an
issue. Also, if we implement pass-through port, then ext_dev is also
not needed since user can either unbind the driver from pass-through
port or just ignore the secondary input device in his/her config.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][1/2] SquashFS

2005-03-14 Thread Matt Mackall
A quick skim...

> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * inode.c
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "squashfs.h"

Please put all the asm/ bits after all the linux/ bits.
> +
> +static void squashfs_put_super(struct super_block *);
> +static int squashfs_statfs(struct super_block *, struct kstatfs *);
> +static int squashfs_symlink_readpage(struct file *file, struct page *page);
> +static int squashfs_readpage(struct file *file, struct page *page);
> +static int squashfs_readpage4K(struct file *file, struct page *page);
> +static int squashfs_readdir(struct file *, void *, filldir_t);
> +static void squashfs_put_super(struct super_block *s);
> +static struct inode *squashfs_alloc_inode(struct super_block *sb);
> +static void squashfs_destroy_inode(struct inode *inode);
> +static int init_inodecache(void);
> +static void destroy_inodecache(void);
> +static struct dentry *squashfs_lookup(struct inode *, struct dentry *,
> + struct nameidata *);
> +static struct inode *squashfs_iget(struct super_block *s, squashfs_inode 
> inode);
> +static unsigned int read_blocklist(struct inode *inode, int index,
> + int readahead_blks, char *block_list,
> + unsigned short **block_p, unsigned int *bsize);
> +static struct super_block *squashfs_get_sb(struct file_system_type *, int,
> + const char *, void *);

Would be nice to reorder things so that fewer forward declarations
were needed.

> +static z_stream stream;
> +
> +static struct file_system_type squashfs_fs_type = {
> + .owner = THIS_MODULE,
> + .name = "squashfs",
> + .get_sb = squashfs_get_sb,
> + .kill_sb = kill_block_super,
> + .fs_flags = FS_REQUIRES_DEV
> + };

Weird whitespace.

> +static struct buffer_head *get_block_length(struct super_block *s,
> + int *cur_index, int *offset, int *c_byte)
> +{
> + squashfs_sb_info *msBlk = (squashfs_sb_info *)s->s_fs_info;

Needless cast from void *. Mixed case identifiers are discouraged.

> + if (!(bh = sb_bread(s, *cur_index)))
> + return NULL;

Please don't do assignment inside if().

> + if (msBlk->devblksize - *offset == 1) {
> + if (msBlk->swap)
> + ((unsigned char *) )[1] = *((unsigned char *)
> + (bh->b_data + *offset));
> + else
> + ((unsigned char *) )[0] = *((unsigned char *)
> + (bh->b_data + *offset));

That's rather ugly, what's going on here? There seems to be a lot of
this swapping going on. At the very least, let's use u8.

> +block_release:
> + while (--b >= 0) brelse(bh[b]);

Linebreak.

> + for (i = msBlk->next_cache, n = SQUASHFS_CACHED_BLKS;
> + n ; n --, i = (i + 1) %
> + SQUASHFS_CACHED_BLKS)

Messy. Do n-- (no space), handle i outside the for control structures.
Perhaps break this piece out into a separate function, the indenting
is making things cramped.

> + if (n == 0) {
> + wait_queue_t wait;
> +
> + init_waitqueue_entry(, current);
> + add_wait_queue(>waitq, );
> + set_current_state(TASK_UNINTERRUPTIBLE);
> + up(>block_cache_mutex);
> + schedule();
> + set_current_state(TASK_RUNNING);
> + remove_wait_queue(>waitq, );

I suspect you'll find there's a much cleaner way to do whatever it is you're
trying to do here.

> + if (!(msBlk->block_cache[i].data =
> + (unsigned char *)
> + kmalloc(SQUASHFS_METADATA_SIZE,
> + GFP_KERNEL))) {

Another class of unnecessary cast.

> + msBlk->fragment_index[SQUASHFS_FRAGMENT_INDEX(fragment)];
> + int offset = SQUASHFS_FRAGMENT_INDEX_OFFSET(fragment);

Feel free to make these defines a little less unwieldy. So long as
they're internal to Squashfs.

> + for (;;) {

while (1)

> + for (i = 0; i < SQUASHFS_CACHED_FRAGMENTS &&
> + msBlk->fragment[i].block != start_block; i++);

';' on its own line. Better is 

for (i = 0; i < SQUASHFS_CACHED_FRAGMENTS; i++)

[PATCH] reiserfs: use NULL instead of 0

2005-03-14 Thread Randy.Dunlap
(resend)

Use NULL instead of 0 for pointer (sparse warning):
fs/reiserfs/namei.c:611:50: warning: Using plain integer as NULL pointer

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>

diffstat:=
 fs/reiserfs/namei.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -Naurp ./fs/reiserfs/namei.c~reiserfs_null ./fs/reiserfs/namei.c
--- ./fs/reiserfs/namei.c~reiserfs_null 2005-02-15 13:48:46.327310224 -0800
+++ ./fs/reiserfs/namei.c   2005-02-15 20:53:22.903281976 -0800
@@ -608,7 +608,7 @@ static int reiserfs_create (struct inode
 goto out_failed;
 }
 
-retval = reiserfs_new_inode (, dir, mode, 0, 0/*i_size*/, dentry, 
inode);
+retval = reiserfs_new_inode (, dir, mode, NULL, 0/*i_size*/, dentry, 
inode);
 if (retval)
 goto out_failed;



---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [CHECKER] XFS doesn't respect mount -o sync (XFS, 2.6.11)

2005-03-14 Thread Nathan Scott
On Sat, Mar 12, 2005 at 02:14:50AM -0800, Junfeng Yang wrote:
> 
> Hi,
> 
> We are from the Stanford Checker team and are working on a file system
> checker called FiSC.  We checked XFS and found that even when a XFS
> partition is mounted -o sync, file system operations are still not sync'ed
> correctly.

Its -o wsync in XFS.  This is the IRIX way, anyway - from a bit of
man page reading.  We should be stitching that into -o sync a bit
better in XFS.  The combination of -o wsync,sync should get you the
equivalent behaviour at the moment though, I think.

> We are not sure if this is the expected behavior on XFS or not, so your
> inputs on this are well appreciated.

Try using -o wsync for your tests, I'll look into our interpretation
of -o sync in XFS in the meantime.

cheers.

-- 
Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >