Re: rcu-refcount stacker performance

2005-07-14 Thread serue
Quoting Paul E. McKenney ([EMAIL PROTECTED]):
> On Thu, Jul 14, 2005 at 08:44:50AM -0500, [EMAIL PROTECTED] wrote:
> > Quoting Paul E. McKenney ([EMAIL PROTECTED]):
> > > My guess is that the reference count is indeed costing you quite a
> > > bit.  I glance quickly at the patch, and most of the uses seem to
> > > be of the form:
> > > 
> > >   increment ref count
> > >   rcu_read_lock()
> > >   do something
> > >   rcu_read_unlock()
> > >   decrement ref count
> > > 
> > > Can't these cases rely solely on rcu_read_lock()?  Why do you also
> > > need to increment the reference count in these cases?
> > 
> > The problem is on module unload: is it possible for CPU1 to be
> > on "do something", and sleep, and, while it sleeps, CPU2 does
> > rmmod(lsm), so that by the time CPU1 stops sleeping, the code it
> > is executing has been freed?
> 
> OK, but in the above case, "do something" cannot be sleeping, since
> it is under rcu_read_lock().

Oh, but that's not quite what the code is doing, rather it is doing:

rcu_read_lock
while get next element from list
inc element.refcount
rcu_read_unlock
do something
rcu_read_lock
dec refcount
rcu_read_unlock

What I plan to try next is:

rcu_read_lock
while get next element from list
if (element->owning_module->state != LIVE)
continue
rcu_read_unlock
do something
rcu_read_lock
rcu_read_unlock

> > Because stacker won't remove the lsm from the list of modules
> > until mod->exit() is executed, and module_free(mod) happens
> > immediately after that, the above scenario seems possible.
> 
> Right, if you have some other code path that sleeps (outside of
> rcu_read_lock(), right?), then you need the reference count for that
> code path.  But the code paths that do not sleep should be able to
> dispense with the reference count, reducing the cache-line traffic.

Most if not all of the codepaths can sleep, however.  So unfortunately
that doesn't seem a feasible solution.  That's why I'm hoping there is
something inherent in the module unload code that I can take advantage
of to forego my own refcounting.

thanks,
-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] split PCI probing code [1/9]

2005-07-14 Thread Francois Romieu
Adam Belay <[EMAIL PROTECTED]> :
[...]

Some nits + a suspect error branch. It seems nice otherwise.

> --- a/drivers/pci/bus/bus.c   1969-12-31 19:00:00.0 -0500
> +++ b/drivers/pci/bus/bus.c   2005-07-10 22:32:53.0 -0400
[...]
> +struct pci_bus * pci_alloc_bus(void)
> +{
> + struct pci_bus *b;
> +
> + b = kmalloc(sizeof(*b), GFP_KERNEL);
> + if (b) {
> + memset(b, 0, sizeof(*b));

mm/slab.c provides kcalloc.

[...]
> --- a/drivers/pci/bus/config.c1969-12-31 19:00:00.0 -0500
> +++ b/drivers/pci/bus/config.c2005-07-12 00:52:35.147664368 -0400
[...]
> +static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int 
> rom)
> +{
> + unsigned int pos, reg, next;
> + u32 l, sz;
> + struct resource *res;
> +
> + for(pos=0; pos +static struct pci_dev * __devinit
> +pci_scan_device(struct pci_bus *bus, int devfn)
> +{
[...]
> + dev = kmalloc(sizeof(struct pci_dev), GFP_KERNEL);
> + if (!dev)
> + return NULL;
> +
> + memset(dev, 0, sizeof(struct pci_dev));

kcalloc

[...]
> + /* Assume 32-bit PCI; let 64-bit PCI cards (which are far rarer)
> +set this higher, assuming the system even supports it.  */
> + dev->dma_mask = 0x;

DMA_32BIT_MASK

> + if (pci_setup_device(dev) < 0) {
> + kfree(dev);
> + return NULL;
> + }
> + device_initialize(>dev);
> + dev->dev.release = pci_release_dev;
> + pci_dev_get(dev);
> +
> + pci_name_device(dev);
> +
> + dev->dev.dma_mask = >dma_mask;
> + dev->dev.coherent_dma_mask = 0xull;

DMA_32BIT_MASK

[...]
> +struct pci_dev * __devinit
> +pci_scan_single_device(struct pci_bus *bus, int devfn)
> +{
> + struct pci_dev *dev;
> +
> + dev = pci_scan_device(bus, devfn);
> + pci_scan_msi_device(dev);
> +
> + if (!dev)
> + return NULL;

Why not do the test immediately ?

[...]
> --- a/drivers/pci/bus/probe.c 1969-12-31 19:00:00.0 -0500
> +++ b/drivers/pci/bus/probe.c 2005-07-12 00:55:50.580953992 -0400
[...]
> +int __devinit pci_scan_bridge(struct pci_bus *bus, struct pci_dev * dev, int 
> max, int pass)
[...]
> +
> + /* Prevent assigning a bus number that already exists.
> +  * This can happen when a bridge is hot-plugged */
> + if (pci_find_bus(pci_domain_nr(bus), max+1))

if (pci_find_bus(pci_domain_nr(bus), max + 1))

[...]
> + /*
> +  * For CardBus bridges, we leave 4 bus numbers
> +  * as cards with a PCI-to-PCI bridge can be
> +  * inserted later.
> +  */
> + for (i=0; i +int __devinit pci_scan_slot(struct pci_bus *bus, int devfn)
> +{
> + int func, nr = 0;
> + int scan_all_fns;
> +
> + scan_all_fns = pcibios_scan_all_fns(bus, devfn);
> +
> + for (func = 0; func < 8; func++, devfn++) {
> + struct pci_dev *dev;
> +
> + dev = pci_scan_single_device(bus, devfn);
> + if (dev) {
> + nr++;
> +
> + /*
> +  * If this is a single function device,
> +  * don't scan past the first function.
> +  */
> + if (!dev->multifunction) {
> + if (func > 0) {
> + dev->multifunction = 1;
> + } else {
> + break;
> + }

if (func == 0)
break;
dev->multifunction = 1;


[...]
> +unsigned int __devinit pci_scan_child_bus(struct pci_bus *bus)
> +{
[...]
> + pcibios_fixup_bus(bus);
> + for (pass=0; pass < 2; pass++)

for (pass = 0; pass < 2; pass++)

[...]
> +struct pci_bus * __devinit pci_scan_bus_parented(struct device *parent, int 
> bus, struct pci_ops *ops, void *sysdata)
> +{
> + int error;
> + struct pci_bus *b;
> + struct device *dev;
> +
> + b = pci_alloc_bus();
> + if (!b)
> + return NULL;
> +
> + dev = kmalloc(sizeof(*dev), GFP_KERNEL);
> + if (!dev){
> + kfree(b);
> + return NULL;
> + }

The code below uses goto. Why not here ?

> +
> + b->sysdata = sysdata;
> + b->ops = ops;
> +
> + if (pci_find_bus(pci_domain_nr(b), bus)) {
> + /* If we already got to this bus through a different bridge, 
> ignore it */
> + pr_debug("PCI: Bus %04x:%02x already known\n", 
> pci_domain_nr(b), bus);
> + goto err_out;
> + }
> + spin_lock(_bus_lock);
> + list_add_tail(>node, _root_buses);
> + spin_unlock(_bus_lock);
> +
> + memset(dev, 0, sizeof(*dev));

kcalloc

> + dev->parent = parent;
> + dev->release = pci_release_bus_bridge_dev;
> + 

Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Chris Friesen

Linus Torvalds wrote:

There's absolutely nothing wrong with "jiffies", and anybody who thinks 
that


msleep(20);

is fundamentally better than

timeout = jiffies + HZ/50;

just doesn't realize that the latter is a bit more complicated exactly 
because the latter is a hell of a lot more POWERFUL.


But if all I really want is to sleep for 20ms, what does the additional 
power actually buy me?


Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: moving DRM header files

2005-07-14 Thread Sam Ravnborg
> 
> When you start merging DRM and fbdev you will be able to use relative
> paths that are closer together.  For example #include
> "../char/drm/drmP.h" versus "#include "drm/drmP.h" for internal
> headers.

No. Using relative include paths is not good. I will most probarly
not work with make O=.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel Bug Report

2005-07-14 Thread Paul Vander Griend

System:
Motherboard = Tyan K8WE
Processor = 2x Opteron 250
Memory = 8GB ECC Registered

On all of the recent release candidates except for
2.6.13-rc2-git2 the kernel panics while booting. These
versions include 2.6.13-rc2-git* (* != 2 ) and 2.6.13-rc3.

I also want to mention that I am using gcc 3.3.5 on debian and
that during compilation there are 3 messages at the end that
say an assertion has failed IE (LD: assertion failed).

It looks like it panics during a mem_cpy but I know its
difficult to tell just by the output.

I get a code: f3 a4 c3 66 66 66 90 66 66 66 90 66 66 66 90 66

The problem appears very reproducable so I can provide more
information upon request.

My .config is avaible upon request.

-Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Arjan van de Ven
On Thu, 2005-07-14 at 09:37 -0700, Linus Torvalds wrote:


> There should be an _absolute_ interface

I'm not arguing there shouldn't be an absolute interface. I'm arguing
that *most* uses are relative, and as such a relative interface makes
sense for those cases.


> Btw, this is exactly why the jiffy-based thing is _good_. The kernel 
> timers _are_ absolute, and you make them relative by adding "jiffies".

again there is absolutely nothing wrong with having absolute timers and
a general notion of absolute time. Jiffies is one way of achieving that,
and it's the current linux way. I see the "absolute timers are good"
argument sort of separate from "jiffies / HZ are good" argument; there
is no principal reason why such an interface couldn't be in say usec.


> There's absolutely nothing wrong with "jiffies", and anybody who thinks 
> that
> 
>   msleep(20);
> 
> is fundamentally better than
>
>   timeout = jiffies + HZ/50;

I *will* argue that for relative delays in drivers, msleep() is better.
The reason is different than you think of; the argument why I consider
msleep() better as interface for relative delays in drivers is that it
is harder for a driver writer to get wrong, by virtue of it being
simpler. jiffies and HZ conversion is one of those areas that driver
writers very often get wrong. (multiply by HZ not divide for example,
but there's a few dozen ways it can and does go wrong). A relative msec
based interface is a LOT harder to get wrong, and also often is closer
to what the datasheet of the hardware says. I'm not going to say "all
driver writers are stupid" because they're not; however too many of them
just act like they are too much of the time. That doesn't mean that
there is no room for a "powerful interface" next to a simple one, and I
hope you're not fully against adding a simple interface on top of a more
powerful one if that simple interface is a way to reduce mistakes and
thus bugs in drivers.


> just doesn't realize that the latter is a bit more complicated exactly 
> because the latter is a hell of a lot more POWERFUL. Trying to get rid of 
> jiffies for some religious reason is _stupid_.

I have nothing religious against jiffies per se. My argument however is
that with a few simple, relative interfaces *in addition* to an absolute
interface, almost all drivers suddenly are isolated from jiffies and HZ
because they simply don't care. Because they really DON'T care about
absolute time. At all. 

Doing this will in turn open up flexibility in experimenting with how
one implements the timer stuff; there's suddenly a lot less code to
touch in doing so. Also such relative interface can match the intent a
lot better and separated from the actual implementation. 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: LKM function call on kernel function call?

2005-07-14 Thread Daniel Bonekeeper
You can also look about some methods of "function redirection
hooks"... add some opcodes at the start of the "hooked function"
(something like to add a CALL or JMP pointing to the address of your
function). There are docs about this subject, but unfortunately I
couldn't find anything now (http://www.ouah.org/p59-0x08.txt is not
exactly what I'm talking about, it's talking about ELF redirection).
It's a dirty thing to do, and it's not intended to be done in any
production thing (in fact, it's a *hack*).

On 7/5/05, S <[EMAIL PROTECTED]> wrote:
> Is it possible to code a loadable module having function1(), which
> would be called, everytime a particular function of the kernel is
> called? If not, atleast a way this could be done without re-compiling
> the whole kernel and rebooting the system?
> 
> Example:
> 
> My LKM:
> -
> 
> init_module() {
> ...
> }
> 
> function1() {
> ...
> }
> 
> cleanup_module() {
> ...
> }
> 
> 
> I want function1() to be called, everytime the function
> ide_do_rw_disk() of ide-disk.c is called. I do not want to re-compile
> the complete kernel to do this.
> 
> Thanks in advance,
> 
> Regards,
> S
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-c-programming" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
# (perl -e "while (1) { print "\x90"; }") | dd of=/dev/evil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu-refcount stacker performance

2005-07-14 Thread Paul E. McKenney
On Thu, Jul 14, 2005 at 08:44:50AM -0500, [EMAIL PROTECTED] wrote:
> Quoting Paul E. McKenney ([EMAIL PROTECTED]):
> > My guess is that the reference count is indeed costing you quite a
> > bit.  I glance quickly at the patch, and most of the uses seem to
> > be of the form:
> > 
> > increment ref count
> > rcu_read_lock()
> > do something
> > rcu_read_unlock()
> > decrement ref count
> > 
> > Can't these cases rely solely on rcu_read_lock()?  Why do you also
> > need to increment the reference count in these cases?
> 
> The problem is on module unload: is it possible for CPU1 to be
> on "do something", and sleep, and, while it sleeps, CPU2 does
> rmmod(lsm), so that by the time CPU1 stops sleeping, the code it
> is executing has been freed?

OK, but in the above case, "do something" cannot be sleeping, since
it is under rcu_read_lock().

> Because stacker won't remove the lsm from the list of modules
> until mod->exit() is executed, and module_free(mod) happens
> immediately after that, the above scenario seems possible.

Right, if you have some other code path that sleeps (outside of
rcu_read_lock(), right?), then you need the reference count for that
code path.  But the code paths that do not sleep should be able to
dispense with the reference count, reducing the cache-line traffic.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Serial core: 8250_pci could not register serial port for UART chip EXAR XR17D152

2005-07-14 Thread V. ANANDA KRISHNAN
Hi all,

  I have been coming across a problem with my serial port EXAR chip XR
17D152, when I try to use the 8250_pci driver.  I am using
kernel-2.6.12.1 on RHEL4.0-U1 on pSeries box with 4-cpu.  8250_pci
during the boot time, after detecting the exar chip (I checked with the
pci_dev structure and the pci_device_id structure for the info), is
unable to get thru the port registration (static int
__devinit_pciserial_init_one(struct pci_dev *dev, const struct
pci_device_id *ent) procedure in 8250_pci.c).  I debugged the problem
and traced upto the routine "static int uart_match_port(struct uart_port
*port1, struct uart_port *port2" in 8250.c where UPIO_MEM is not
satisfying the condition port1->membase==port2->membase and hence
returns 0.

  If I use the printk for dumping the port-> membase value the system
hags during the boot time with a blank screen (on the serial terminal).
I am yet to try with kernel-2.6.12.2.  Please let me know how to proceed
in this case.  Thanks,
V.Ananda Krishnan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [11/11] x86_64: TASK_SIZE fixes for compatibility mode processes

2005-07-14 Thread Siddha, Suresh B
On Wed, Jul 13, 2005 at 08:49:47PM +0200, Andi Kleen wrote:
> On Wed, Jul 13, 2005 at 11:44:26AM -0700, Greg KH wrote:
> > -stable review patch.  If anyone has any objections, please let us know.
> 
> I think the patch is too risky for stable. I had even my doubts
> for mainline.

hmm.. Main reason why Andrew posted this for stable series is because of
the memory leak issue mentioned in the patch changeset comments...

We have not seen any stability issues because of this patch so far(its been
there for more than a month in -mm series). Lack of this patch is actually 
causing us more troubles (DOS/app failures/..).

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Linus Torvalds


On Thu, 14 Jul 2005, Vojtech Pavlik wrote:
>  
> A note on the relaive timer API: There needs to be a way to say
> "x milliseconds from the time this timer should have triggered" instead
> of "x milliseconds from now", to avoid skew in timers that try to be
> strictly periodic.

I disagree.

There should be an _absolute_ interface, and a driver that wants that 
should just have calculated when in time the timeout finishes - and then 
keep on using the absolute value.

Btw, this is exactly why the jiffy-based thing is _good_. The kernel 
timers _are_ absolute, and you make them relative by adding "jiffies".

The fact is, the current timers are better than people give them credit 
for, and converting them away from a jiffies-based interface (to a 
usleep-like one) is STUPID.

There's absolutely nothing wrong with "jiffies", and anybody who thinks 
that

msleep(20);

is fundamentally better than

timeout = jiffies + HZ/50;

just doesn't realize that the latter is a bit more complicated exactly 
because the latter is a hell of a lot more POWERFUL. Trying to get rid of 
jiffies for some religious reason is _stupid_.

I have to say, this whole thread has been pretty damn worthless in general 
in my not-so-humble opinion.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu-refcount stacker performance

2005-07-14 Thread serue
Quoting Paul E. McKenney ([EMAIL PROTECTED]):
> On Thu, Jul 14, 2005 at 09:21:07AM -0500, [EMAIL PROTECTED] wrote:
> > On July 8 I sent out a patch which re-implemented the rcu-refcounting
> > of the LSM list in stacker for the sake of supporting safe security
> > module unloading.  (patch reattached here for convenience)  Here are
> > some performance results with and without that patch.  Tests were run
> > on a 16-way ppc64 machine.  Dbench was run 50 times, and kernbench
> > and reaim were run 10 times, and intervals are 95% confidence half-
> > intervals.
> > 
> > These results seem pretty poor.  I'm now wondering whether this is
> > really necessary.  David Wheeler's original stacker had an option
> > of simply waiting a while after a module was taken out of the list
> > of active modules before freeing the modules.  Something like that
> > is of course one option.  I'm hoping we can also take advantage of
> > some already known module state info to be a little less coarse
> > about it.  For instance, sys_delete_module() sets m->state to
> > MODULE_STATE_GOING before calling mod->exit().  If in place of
> > doing atomic_inc(>use), stacker skipped the m->hook() if
> > m->state!=MODULE_STATE_LIVE, then it may be safe to assume that
> > any m->hook() should be finished before sys_delete_module() gets
> > to free_module(mod).  This seems to require adding a struct
> > module argument to security/security:mod_reg_security() so an LSM
> > can pass itself along.
> > 
> > So I'll try that next.  Hopefully by avoiding the potential cache
> > line bounces which atomic_inc(>use) bring, this should provide
> > far better performance.
> 
> My guess is that the reference count is indeed costing you quite a
> bit.  I glance quickly at the patch, and most of the uses seem to
> be of the form:
> 
>   increment ref count
>   rcu_read_lock()
>   do something
>   rcu_read_unlock()
>   decrement ref count
> 
> Can't these cases rely solely on rcu_read_lock()?  Why do you also
> need to increment the reference count in these cases?

The problem is on module unload: is it possible for CPU1 to be
on "do something", and sleep, and, while it sleeps, CPU2 does
rmmod(lsm), so that by the time CPU1 stops sleeping, the code it
is executing has been freed?

Because stacker won't remove the lsm from the list of modules
until mod->exit() is executed, and module_free(mod) happens
immediately after that, the above scenario seems possible.

thanks,
-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix the recent C-state with FADT regression

2005-07-14 Thread Venkatesh Pallipadi


Attached patch fixes the recent C-state based on FADT regression reported by
Kevin.

Please apply.

Thanks,
Venki


Fix the regression with c1_default_handler on some systems where C-states come
from FADT.

Thanks to Kevin Radloff for identifying the issue and also root causing on 
exact line of code that is causing the issue.

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>

diff -purN  linux-2.6.13-rc1-mm1//drivers/acpi/processor_idle.c.org 
linux-2.6.13-rc1-mm1//drivers/acpi/processor_idle.c
--- linux-2.6.13-rc1-mm1//drivers/acpi/processor_idle.c.org 2005-07-14 
23:19:45.038854688 -0700
+++ linux-2.6.13-rc1-mm1//drivers/acpi/processor_idle.c 2005-07-14 
23:21:47.292269344 -0700
@@ -881,7 +881,7 @@ static int acpi_processor_get_power_info
result = acpi_processor_get_power_info_cst(pr);
if ((result) || (acpi_processor_power_verify(pr) < 2)) {
result = acpi_processor_get_power_info_fadt(pr);
-   if (result)
+   if ((result) || (acpi_processor_power_verify(pr) < 2))
result = acpi_processor_get_power_info_default_c1(pr);
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Why is 2.6.12.2 less stable on my laptop than 2.6.10?

2005-07-14 Thread Mark Gross
I know this is a broken record, but the development process within the LKML 
isn't resulting in more stable and better code.  Some process change could be 
a good thing.

Why does my alps mouse pad have to stop working every time I test a new 
"STABLE" kernel?  

Why does swsup have to start hanging on shut and startup down randomly?

I rolled back my home box with 2.6.10 because I want some stability (2.6.10 
has problems with swsusp from time to time, but it livable for me, for now.)

The process is broken if on a stable series we cannot at least make sure 
obvious regressions don't smack users between the eyes.

I see the problem as that too much code flux is happening from people without 
the resources, or discipline, to effectively regresion test for side effects 
of their changes.  

I know there is a lot of back patting on how well the dot-dot stability 
release process is working, but that process is a solution for a different 
and simpler problem and we still have breakage.

Stability and deliberate feature design and development along with disciplined 
regression testing and validation is what is needed.  Why can't there be more 
targeted and planned development?  Are we in a race to see how many changes 
we can push into a "stable" tree?

Shouldn't changes be regression tested, formally, before its allowed to go 
into a tree? 

Why can't I expect SWSusp work better and more reliable from release to 
release?  

I know there is a point where software goes from fun to work, but without more 
deliberate and disciplined WORK I see the 2.6 tree spinning out of control.

The problem is the process, not than the code.
* The issues are too much ad-hock code flux without enough disciplined/formal 
regression testing and review.  
* Small regressions are accepted and expected to be cached latter.
* ad-hock validation before changes are accepted.

Some possible things that could help:

*Addopt a no-regressions-allowed policy and everthing stops until any 
identified regressions (in performance, functionally or stability) is fixed 
or the changes are all rolled back.  This works really well if in addition 
organized pre-flight testing is done before calling a new version number.  
You simply cannot rely on ad-hock regression testing and reporting.  Its got 
too much latency.
* assign validation folks that the developer need to appease before changes 
are allowed to be accepted into the tree. 
* Make all changes to the kernel not be submitted by the developers, but by 
designated subsystem validation owners.  If too many bugs continue to sneak 
by address the problem by adding validation help to that subsystem or get a 
new owner for the problem subsystem.  (<-- I like this one a lot.)
* start 2.7 
* all of the above (<--this one is good too)

--mgross
BTW: This may or may not be the opinion of my employer, more likely not.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


pc_keyb: controller jammed (0xA7)

2005-07-14 Thread Thoralf Will
Hello,

I didn't find any useful answer anywhere so far, hope it's ok to ask here.
I'm currently trying to get a 2.4.31 up and running on an IBM
BladeCenter HS20/8843. (base system is a stripped down RH9)

When booting the kernel the console is spammmed with:
   pc_keyb: controller jammed (0xA7)
But it seems there are no further consequences and the keyboard is
working. The only answer I've found is "disable usb legacy" in the BIOS
but that's no solution for me because there is no option to disable usb
legacy support and it wouldn't make any sense anyway because the
keyboard is an usb-device, so I really do need support for usb.

Is there a workaround? Is this an already known bug? Anything wrong on
my side?


Thanks,
Thoralf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: About a change to the implementation of spin lock in 2.6.12 kernel.

2005-07-14 Thread Brandon Niemczyk
On Thu, 2005-07-14 at 09:21 -0700, [EMAIL PROTECTED] wrote:
> Hi Willy,
> 
> I think at least I can remove the LOCK instruction when the lock is already 
> held by someone else and enter the spinning wait directly, right?
If the lock is already held by someone else, the cpu is just going to
burn cycles until it's not. So why do you care?

-- 
Brandon Niemczyk

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] RealTimeSync Patch

2005-07-14 Thread Elias Kesh
Hello,

I would like to get some feedback on this patch for the kernel.  It's sole 
purpose is to help in reducing boot time by not waiting to synchronize the 
clock edge with the hardware clock. This when combined with other boot 
reduction patched can bring the kernel boot time to well under 10 seconds, in 
most cases two or three seconds.  In a desktop system this patch is probably 
insignificant, howerver several patches like this in a set top box or cell 
phone will be signicant.

 I understand that there may be some concerns with patches like these so I 
would like to start a discussion so that I can better understand what the 
issues are. The members of the CELinux Forum have quite a bit we would like to 
contribute.

Looking at the archives I see that a an intel patch was submitted back in 
October but I am unable to determine what the resolution was.

This patch included is for PPC but other architecutres are available on the 
patch web site below.

Detailed information on the patch can be found here:
http://tree.celinuxforum.org/CelfPubWiki/RTCNoSync

In addition, other patches for boot time reduction can be found here:
http://tree.celinuxforum.org/CelfPubWiki/PatchArchive

Elias Kesh
[EMAIL PROTECTED]


* Fast boot options
*
Fast boot options (FASTBOOT) [N/y/?] (NEW) y
  Disable synch on read of Real Time Clock (RTC_NO_SYNC) [N/y/?] (NEW) y



diff -u -pruN -X ../dontdiff linux-2.6.12/arch/ppc/kernel/time.c 
linux-2.6.12_rtc_patch/arch/ppc/kernel/time.c
--- linux-2.6.12/arch/ppc/kernel/time.c 2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12_rtc_patch/arch/ppc/kernel/time.c   2005-07-02 
00:27:37.0 +0200
@@ -282,8 +282,12 @@ EXPORT_SYMBOL(do_settimeofday);
 /* This function is only called on the boot processor */
 void __init time_init(void)
 {
-   time_t sec, old_sec;
-   unsigned old_stamp, stamp, elapsed;
+   time_t sec;
+   unsigned stamp;
+#ifndef CONFIG_RTC_NO_SYNC
+   time_t old_sec;
+   unsigned old_stamp, elapsed;
+#endif
 
 if (ppc_md.time_init != NULL)
 time_offset = ppc_md.time_init();
@@ -308,6 +312,7 @@ void __init time_init(void)
stamp = get_native_tbl();
if (ppc_md.get_rtc_time) {
sec = ppc_md.get_rtc_time();
+#ifndef CONFIG_RTC_NO_SYNC
elapsed = 0;
do {
old_stamp = stamp;
@@ -320,6 +325,7 @@ void __init time_init(void)
} while ( sec == old_sec && elapsed < 2*HZ*tb_ticks_per_jiffy);
if (sec==old_sec)
printk("Warning: real time clock seems stuck!\n");
+#endif
xtime.tv_sec = sec;
xtime.tv_nsec = 0;
/* No update now, we just read the time from the RTC ! */
diff -u -pruN -X ../dontdiff linux-2.6.12/init/Kconfig 
linux-2.6.12_rtc_patch/init/Kconfig
--- linux-2.6.12/init/Kconfig   2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12_rtc_patch/init/Kconfig 2005-07-02 00:27:37.0 +0200
@@ -275,6 +275,33 @@ config KALLSYMS_EXTRA_PASS
   reported.  KALLSYMS_EXTRA_PASS is only a temporary workaround while
   you wait for kallsyms to be fixed.
 
+menuconfig FASTBOOT
+   bool "Fast boot options"
+   help
+ Say Y here to select among various options that can decrease
+ kernel boot time.  These options may involve providing
+ hardcoded values for some parameters that the kernel usually
+ determines automatically.
+
+ This option is useful primarily on embedded systems.
+
+ If unsure, say N.
+
+config RTC_NO_SYNC
+   bool "Disable synch on read of Real Time Clock" if FASTBOOT
+   default n
+   help
+ The Real Time Clock is read aligned by default. That means a
+ series of reads of the RTC are done until it's verified that
+  the RTC's state has just changed.  If you enable this feature,
+  this synchronization will not be performed.  The result is that
+ the machine will boot up to 1 second faster. 
+
+ A drawback is that, with this option enabled, your system
+ clock may drift from the correct value over the course
+ of several boot cycles (under certain circumstances).
+
+ If unsure, say N.
 
 config PRINTK
default y


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: About a change to the implementation of spin lock in 2.6.12 kernel.

2005-07-14 Thread multisyncfe991

Hi Willy,

I think at least I can remove the LOCK instruction when the lock is already 
held by someone else and enter the spinning wait directly, right?

0: cmpb $0, slp

   jle  2f# lock is not available, then spinning 
directly without locking the bus


1: lock; decb slp   # lock the bus and atomically decrement

   jns   3f  # if clear sign bit jump forward to 3

2: pause   # spin - wait

   cmpb $0,slp # spin - compare to 0

   jle 2b   # spin - go back to 2 if <= 0 (locked)

   jmp 1b # unlocked; go back to 1 to try to lock again

3: # we have acquired the lock .

But based on the Lockmeter report, the lock success is dominant 99.8%, so 
maybe this will not make much change.

Thanks,

Liang

- Original Message - 
From: "Willy Tarreau" <[EMAIL PROTECTED]>

To: <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, July 13, 2005 10:16 PM
Subject: Re: About a change to the implementation of spin lock in 2.6.12 
kernel.




Hi,

On Wed, Jul 13, 2005 at 07:20:06PM -0700, [EMAIL PROTECTED] 
wrote:

Hi,

I found _spin_lock used a LOCK instruction to make the following
operation "decb %0" atomic. As you know, LOCK instruction alone takes
almost 70 clock cycles to finish and this add lots of cost to the
_spin_lock. However _spin_unlock does not use this LOCK instruction and
it uses "movb $1,%0" instead since 4-byte writes on 4-byte aligned
addresses are atomic.


_spin_unlock does not need locked operations because when it is run, the
code is already known to be the only one to hold the lock, so it can
release it without checking what others do.


So I want rewrite the _spin_lock defined spinlock.h
(/linux/include/asm-i386) as follows to reduce the overhead of _spin_lock
and make it more efficient.


It does not work. You cannot write an inter-cpu atomic test-and-set with
several unlocked instructions.


#define spin_lock_string \
   "\n1:\t" \
   "cmpb $0,%0\n\t" \
   "jle 2f\n\t" \


==> here, another thread or CPU can get the lock simultaneously.


   "movb $0, %0\n\t" \
   "jmp 3f\n" \
   "2:\t" \
   "rep;nop\n\t" \
   "cmpb $0, %0\n\t" \
   "jle 2b\n\t" \
   "jmp 1b\n" \
   "3:\n\t"

Compared with the original version as follows, LOCK instruction is
removed. I rebuilt the Intel e1000 Gigabit driver with this _spin_lock.
There is about 2% throughput improvement.
#define spin_lock_string \
   "\n1:\t" \
   "lock ; decb %0\n\t" \
   "jns 3f\n" \
   "2:\t" \
   "rep;nop\n\t" \
   "cmpb $0,%0\n\t" \
   "jle 2b\n\t" \
   "jmp 1b\n" \
   "3:\n\t"

Do you think I can get a better performance if I dig further?

Any ideas will be greatly appreciated,


well, of course with those methods you can improve performance, but you
lose the warranty that you're alone to get a lock, and that's bad.

another similar method to get a lock in some very controlled environment
is as follows :

 1:  cmp $0, %0
 jne 1b
 mov $CPUID, %0
 membar
 cmp $CPUID, %0
 jne 1b

This only works with same speed CPUs and interrupts disabled. But in 
todays
environments, this is very risky (hyperthreaded CPUs, etc...). However, 
this

is often OK for more deterministic CPUs such as microcontrollers.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fdisk: What do plus signs after "Blocks" mean?

2005-07-14 Thread DervishD
   Hi kernel.

 * kernel <[EMAIL PROTECTED]> dixit:
> First 446 bytes are boot code and all
> Next 64 bytes are for 4 partition records, 16 bytes each
> Last 2 bytes are signature 

And that's right, but only for the MBR. If you set up an extended
partition in the MBR, the partition table for that extended partition
is on the boot record of the extended partition. If you just backup
the MBR, you only backup the *declaration* of the extended partition
(where it starts, where it ends, etc.) but NOT the partition table of
the extended partition (that is, the partitions within the extended
partition). For storing that you have to backup the first sector of
the extended partition itself. And you have to do it recursively if
you want to backup any partition setup, no matter how strange.

I hope I've made this clear, is a bit difficult to explain
without a couple of diagrams O:)

Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736 | http://www.dervishd.net
http://www.pleyades.net & http://www.gotesdelluna.net
It's my PC and I'll cry if I want to...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC,PATCH] RCU and CONFIG_PREEMPT_RT semi-sane patch

2005-07-14 Thread Paul E. McKenney
Hello!

The attached patch passed about 36 hours of torture test on each of two
4-CPU x86 machines (about 100 passes through the torture-test script), so
am officially declaring it to be semi-sane.  That said, on eight runs of
kernbench+LTP (also on 4-CPU x86 machines), only six passed, and the other
two hung in LTP (but both did make it through five rounds of kernbench).
So there are still some problems in there somewhere.  The hangs were such
that I got no debug info of any sort :-/ , so will be continuing testing.

But this patch might be acceptable to courageous users of the
CONFIG_PREEMPT_RT patch who aren't too concerned about SMP performance
and scalability.  ;-)

The following caveats still apply:

o   Still have heavyweight operations in rcu_read_lock() and
rcu_read_unlock().  Will work on removing the atomic_inc()
and atomic_dec() first, with the memory barriers later.

o   Global callback queues result in poor SMP performance.  On
the list to fix.  Will likely require handling CPU hotplug
(current code is too stupid to need to care about CPU hotplug).

o   Grace-period-detection code is probably too aggressive, but will
worry about that later.  This will interact with OOM on systems
with small memories.

o   There are likely still bugs.  Which might cause occasional
hangs.  ;-)

o   Applies against V0.7.51-27 of Ingo's patch.

However, the code (but not the patch) should work in a stock kernel
as well as in the CONFIG_PREEMPT_RT environment.  Thanks to Steve
Rostedt and Bill Huey for their help with this!

Thoughts?

Thanx, Paul

PS.  Will be on travel next week, so response time may be a bit slow.

Signed-off-by: <[EMAIL PROTECTED]>

diff -urpN -X dontdiff 
linux-2.6.12-realtime-preempt-V0.7.51-27/fs/proc/proc_misc.c 
linux-2.6.12-realtime-preempt-V0.7.51-27-ctrRCU/fs/proc/proc_misc.c
--- linux-2.6.12-realtime-preempt-V0.7.51-27/fs/proc/proc_misc.c
2005-07-13 14:52:43.0 -0700
+++ linux-2.6.12-realtime-preempt-V0.7.51-27-ctrRCU/fs/proc/proc_misc.c 
2005-07-13 14:54:10.0 -0700
@@ -599,6 +599,38 @@ void create_seq_entry(char *name, mode_t
entry->proc_fops = f;
 }
 
+#ifdef CONFIG_RCU_STATS
+int rcu_read_proc(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+   int len;
+   extern int rcu_read_proc_data(char *page);
+
+   len = rcu_read_proc_data(page);
+   return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
+int rcu_read_proc_gp(char *page, char **start, off_t off,
+int count, int *eof, void *data)
+{
+   int len;
+   extern int rcu_read_proc_gp_data(char *page);
+
+   len = rcu_read_proc_gp_data(page);
+   return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
+int rcu_read_proc_ptrs(char *page, char **start, off_t off,
+  int count, int *eof, void *data)
+{
+   int len;
+   extern int rcu_read_proc_ptrs_data(char *page);
+
+   len = rcu_read_proc_ptrs_data(page);
+   return proc_calc_metrics(page, start, off, count, eof, len);
+}
+#endif /* #ifdef CONFIG_RCU_STATS */
+
 void __init proc_misc_init(void)
 {
struct proc_dir_entry *entry;
@@ -621,6 +653,11 @@ void __init proc_misc_init(void)
{"cmdline", cmdline_read_proc},
{"locks",   locks_read_proc},
{"execdomains", execdomains_read_proc},
+#ifdef CONFIG_RCU_STATS
+   {"rcustats",rcu_read_proc},
+   {"rcugp",   rcu_read_proc_gp},
+   {"rcuptrs", rcu_read_proc_ptrs},
+#endif /* #ifdef CONFIG_RCU_STATS */
{NULL,}
};
for (p = simple_ones; p->name; p++)
diff -urpN -X dontdiff 
linux-2.6.12-realtime-preempt-V0.7.51-27/include/linux/rcupdate.h 
linux-2.6.12-realtime-preempt-V0.7.51-27-ctrRCU/include/linux/rcupdate.h
--- linux-2.6.12-realtime-preempt-V0.7.51-27/include/linux/rcupdate.h   
2005-07-13 14:52:43.0 -0700
+++ linux-2.6.12-realtime-preempt-V0.7.51-27-ctrRCU/include/linux/rcupdate.h
2005-07-13 14:54:10.0 -0700
@@ -59,6 +59,7 @@ struct rcu_head {
 } while (0)
 
 
+#ifndef CONFIG_PREEMPT_RCU
 
 /* Global control variables for rcupdate callback mechanism. */
 struct rcu_ctrlblk {
@@ -209,6 +210,18 @@ static inline int rcu_pending(int cpu)
 # define rcu_read_unlock preempt_enable
 #endif
 
+#else /* #ifndef CONFIG_PREEMPT_RCU */
+
+#define rcu_qsctr_inc(cpu)
+#define rcu_bh_qsctr_inc(cpu)
+#define call_rcu_bh(head, rcu) call_rcu(head, rcu)
+
+extern void rcu_read_lock(void);
+extern void rcu_read_unlock(void);
+extern int rcu_pending(int cpu);
+
+#endif /* #else #ifndef CONFIG_PREEMPT_RCU */
+
 /*
  * So where is rcu_write_lock()?  It does not exist, as there is no
  * way for writers to lock out RCU readers.  This is a feature, not
@@ -230,16 +243,22 @@ static inline int 

Re: RT and XFS

2005-07-14 Thread Christoph Hellwig
On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote:
> On Thu, 2005-07-14 at 07:23 +0200, Ingo Molnar wrote:
> > * Daniel Walker <[EMAIL PROTECTED]> wrote:
> > 
> > > > The whole point of using a semaphore in the pagebuf is because there
> > > > is no tracking of who "owns" the lock so we can actually release it
> > > > in a different context. Semaphores were invented for this purpose,
> > > > and we use them in the way they were intended. ;)
> > > 
> > > Where is the that semaphore spec, is that posix ?  There is a new 
> > > construct called "complete" that is good for this type of stuff too. 
> > > No owner needed , just something running, and something waiting till 
> > > it completes.
> > 
> > wrt. posix, we dont really care about that for kernel-internal 
> > primitives like struct semaphore. So whether it's posix or not has no 
> > relevance.
> 
> This reminds me of Documentation/stable_api_nonsense.txt . That no one
> should really be dependent on a particular kernel API doing a particular
> thing. The kernel is play dough for the kernel hacker (as it should be),
> including kernel semaphores.
> 
> So we can change whatever we want, and make no excuses, as long as we
> fix the rest of the kernel to work with our change. That seems pretty
> sensible , because Linux should be an evolution. 
> 
> Daniel
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
---end quoted text---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RT and XFS

2005-07-14 Thread Christoph Hellwig
On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote:
> This reminds me of Documentation/stable_api_nonsense.txt . That no one
> should really be dependent on a particular kernel API doing a particular
> thing. The kernel is play dough for the kernel hacker (as it should be),
> including kernel semaphores.
> 
> So we can change whatever we want, and make no excuses, as long as we
> fix the rest of the kernel to work with our change. That seems pretty
> sensible , because Linux should be an evolution. 

Daniel, get a fucking clue.  Read some CS 101 literature on what a semaphore
is defined to be.  If you want PI singing dancing blinking christmas tree
locking primites call them a mutex, but not a semaphore.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pci_size() error condition

2005-07-14 Thread John Rose
> It was always effectual for IO where the mask is 0x.

Okay, point taken :)  So for cases of base == maxbase, why would we ever
want to return a nonzero value?  What is the intended purpose of the
second part of that conditional?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc patch 2/2] direct-io: remove address alignment check

2005-07-14 Thread Daniel McNeil
On Thu, 2005-07-14 at 06:18, Andi Kleen wrote:
> Daniel McNeil <[EMAIL PROTECTED]> writes:
> 
> > This patch relaxes the direct i/o alignment check so that user addresses
> > do not have to be a multiple of the device block size.
> 
> The original reason for this limit was that lots of drivers
> (not only IDE) explode when you give them odd sizes. Sometimes
> it is even worse.
> 
> I doubt all of them have been fixed.
> 
> Very risky change.
> 

That is exactly why I made this a separate patch, so that we
can test and find out where the problems are and work to fix
them.

Are there problems only with odd sizes, or do drivers have problems
with non-512 sizes?

Allowing 4-byte aligned user addresses would be a good step
forward, since it looks like malloc() returns 4-byte aligned 
addresses.

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux On-Demand Network Access (LODNA)

2005-07-14 Thread Alan Cox
Take a look at FUSE, it should be able to do all you need

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RT and XFS

2005-07-14 Thread Daniel Walker
On Thu, 2005-07-14 at 07:23 +0200, Ingo Molnar wrote:
> * Daniel Walker <[EMAIL PROTECTED]> wrote:
> 
> > > The whole point of using a semaphore in the pagebuf is because there
> > > is no tracking of who "owns" the lock so we can actually release it
> > > in a different context. Semaphores were invented for this purpose,
> > > and we use them in the way they were intended. ;)
> > 
> > Where is the that semaphore spec, is that posix ?  There is a new 
> > construct called "complete" that is good for this type of stuff too. 
> > No owner needed , just something running, and something waiting till 
> > it completes.
> 
> wrt. posix, we dont really care about that for kernel-internal 
> primitives like struct semaphore. So whether it's posix or not has no 
> relevance.

This reminds me of Documentation/stable_api_nonsense.txt . That no one
should really be dependent on a particular kernel API doing a particular
thing. The kernel is play dough for the kernel hacker (as it should be),
including kernel semaphores.

So we can change whatever we want, and make no excuses, as long as we
fix the rest of the kernel to work with our change. That seems pretty
sensible , because Linux should be an evolution. 

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Lee Revell
On Thu, 2005-07-14 at 08:02 -0700, Christoph Lameter wrote:
> I doubt that increasing the timer frequency is the way to go to solve 
> these issues. HZ should be as low as possible and we should strive for
> a tickless system.

Agreed.  Most of those applications are driven by their own interrupt
source anyway.

I do think Linus' proposal, or even copying what Windows does, would be
a big improvement over the fixed tick rate.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rcu-refcount stacker performance

2005-07-14 Thread Paul E. McKenney
On Thu, Jul 14, 2005 at 09:21:07AM -0500, [EMAIL PROTECTED] wrote:
> On July 8 I sent out a patch which re-implemented the rcu-refcounting
> of the LSM list in stacker for the sake of supporting safe security
> module unloading.  (patch reattached here for convenience)  Here are
> some performance results with and without that patch.  Tests were run
> on a 16-way ppc64 machine.  Dbench was run 50 times, and kernbench
> and reaim were run 10 times, and intervals are 95% confidence half-
> intervals.
> 
> These results seem pretty poor.  I'm now wondering whether this is
> really necessary.  David Wheeler's original stacker had an option
> of simply waiting a while after a module was taken out of the list
> of active modules before freeing the modules.  Something like that
> is of course one option.  I'm hoping we can also take advantage of
> some already known module state info to be a little less coarse
> about it.  For instance, sys_delete_module() sets m->state to
> MODULE_STATE_GOING before calling mod->exit().  If in place of
> doing atomic_inc(>use), stacker skipped the m->hook() if
> m->state!=MODULE_STATE_LIVE, then it may be safe to assume that
> any m->hook() should be finished before sys_delete_module() gets
> to free_module(mod).  This seems to require adding a struct
> module argument to security/security:mod_reg_security() so an LSM
> can pass itself along.
> 
> So I'll try that next.  Hopefully by avoiding the potential cache
> line bounces which atomic_inc(>use) bring, this should provide
> far better performance.

My guess is that the reference count is indeed costing you quite a
bit.  I glance quickly at the patch, and most of the uses seem to
be of the form:

increment ref count
rcu_read_lock()
do something
rcu_read_unlock()
decrement ref count

Can't these cases rely solely on rcu_read_lock()?  Why do you also
need to increment the reference count in these cases?

Thanx, Paul

> thanks,
> -serge
> 
> Dbench (throughput, larger is better)
> 
> plain stacker:1531.448400 +/- 15.791116
> stacker with rcu: 1408.056200 +/- 12.597277
> 
> Kernbench (runtime, smaller is better)
> 
> plain stacker:52.341000  +/- 0.184995
> stacker with rcu: 53.722000 +/- 0.161473
> 
> Reaim (numjobs, larger is better) (gnuplot-friendly format)
> plain stacker:
> --
> Numforked   jobs/minute 95% CI
> 1   106662.857000 5354.267865
> 3   301628.571000 6297.121934
> 5   488142.858000 16031.685536
> 7   673200.00 23994.030784
> 9   852428.57 31485.607271
> 11  961714.29 0.00
> 13  1108157.14400027287.525982
> 15  1171178.57100049790.796869
> 
> Reaim (numjobs, larger is better) (gnuplot-friendly format)
> plain stacker:
> --
> Numforked   jobs/minute 95% CI
> 1   100542.857000 2099.040645
> 3   266657.139000 6297.121934
> 5   398892.858000 12023.765252
> 7   467670.00 14911.383385
> 9   418648.352000 11665.751441
> 11  396825.00 8700.115252
> 13  357480.912000 7567.947838
> 15  337571.428000 2332.267703
> 
> Patch:
> 
> Index: linux-2.6.12/security/stacker.c
> ===
> --- linux-2.6.12.orig/security/stacker.c  2005-07-08 13:43:15.0 
> -0500
> +++ linux-2.6.12/security/stacker.c   2005-07-08 16:21:54.0 -0500
> @@ -33,13 +33,13 @@
>  
>  struct module_entry {
>   struct list_head lsm_list;  /* list of active lsms */
> - struct list_head all_lsms; /* list of all lsms */
>   char *module_name;
>   int namelen;
>   struct security_operations module_operations;
> + struct rcu_head m_rcu;
> + atomic_t use;
>  };
>  static struct list_head stacked_modules;  /* list of stacked modules */
> -static struct list_head all_modules;  /* list of all modules, including 
> freed */
>  
>  static short sysfsfiles_registered;
>  
> @@ -84,6 +84,32 @@ MODULE_PARM_DESC(debug, "Debug enabled o
>   * We return as soon as an error is returned.
>   */
>  
> +static inline void stacker_free_module(struct module_entry *m)
> +{
> + kfree(m->module_name);
> + kfree(m);
> +}
> +
> +/*
> + * Version of stacker_free_module called from call_rcu
> + */
> +static void free_mod_fromrcu(struct rcu_head *head)
> +{
> + struct module_entry *m;
> +
> + m = container_of(head, struct module_entry, m_rcu);
> + stacker_free_module(m);
> +}
> +
> +static void stacker_del_module(struct rcu_head *head)
> +{
> + struct module_entry *m;
> + 
> + m = container_of(head, struct module_entry, m_rcu);
> + if 

Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Christoph Lameter
On Thu, 14 Jul 2005, Lee Revell wrote:

> On Thu, 2005-07-14 at 10:38 +0200, Ingo Molnar wrote:
> >  - there are real-time applications (robotic environments: fast rotating
> >tools, media and mobile/phone applications, etc.) that want 10
> >usecs precision. If such users increased HZ to 100,000 or even
> >1000,000, the current timer implementation would start to creek: e.g.
> >jiffies on 32-bit systems would wrap around in 11 hours or 1.1 hours.
> >(To solve this cleanly, pretty much the only solution seems to be to
> >increase the timeout to a 64 bit value. A non-issue for 64-bit
> >systems, that's why i think we could eventually look at this 
> >possibility, once all the other problems are hashed out.)
> > 
> 
> Those types of systems will not be 64 bit for many, many years, if
> ever...

Linux can already provide a response time within < 3 usecs from user space 
using f.e. the Altix RTC driver which can generate an interrupt that then 
sends a signal to an application. The Altix RTC clock is supported via POSIX
timer syscalls and can be accessed using CLOCK_SGI_CYCLE. This has been 
available in Linux since last fall and events can be specified with 50 
nanoseconds accurary.

Other clock sources like  HPET could do the same if someone would be 
willing to provide the hookup to the posix layer.

I doubt that increasing the timer frequency is the way to go to solve 
these issues. HZ should be as low as possible and we should strive for a 
tickless system.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-14 Thread Tom Zanussi
Roman Zippel writes:
 > Hi,
 > 
 > On Mon, 11 Jul 2005, Andrew Morton wrote:
 > 
 > > > > Hi Andrew, can you please merge relayfs?  It provides a low-overhead
 > > > > logging and buffering capability, which does not currently exist in
 > > > > the kernel.
 > > > 
 > > > While the code is pretty nicely in shape it seems rather pointless to
 > > > merge until an actual user goes with it.
 > > 
 > > Ordinarily I'd agree.  But this is a bit like kprobes - it's a funny thing
 > > which other kernel features rely upon, but those features are often ad-hoc
 > > and aren't intended for merging.
 > 
 > I agree with Christoph, I'd like to see a small (and useful) example 
 > included, which can be used as reference. relayfs client still need some 
 > code of their own to communicate with user space. If I look at the example 
 > code I'm not really sure netlink is a good way to go as control channel.
 > kprobes has a rather simple interface, relayfs is more complex and I think 
 > it's a good idea to provide some sane and complete example code to copy 
 > from.
 > 

The netlink control channel seems to work very well, but I can
certainly change the examples to use something different.  Could you
suggest something?

 > Looking through the patch there are still a few areas I'm concerned about:
 > - the usage of atomic_t look a little silly, there is only a single 
 > writer and probably needs some cache line optimisations

The only things that are atomic are the counts of produced and
consumed buffers and these are only ever updated or read in the slow
buffer-switch path.  They're atomic because if they weren't, wouldn't
it be possible for the client to read an unfinished value if the
producer was in the middle of updating it?

 > - I would prefer "unsigned int" over just "unsigned"
 > - the padding/commit arrays can be easily managed by the client

Yes, I can move them out and update the examples to reflect that, but
I thought that if this was something that most clients would need to
do, it made some sense to keep it in relayfs and avoid duplication in
the clients.

 > - overwrite mode can be implemented via the buffer switch callback

The buffer switch callback is already where this is handled, unless
you're thinking of something else - one of the first checks in the
buffer switch is relay_buf_full(), which always returns 0 if the
buffer is in overwrite mode.

Tom


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-14 Thread Stefan Seyfried
Andy Isaacson wrote:
> Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and
> then resumed.  It ran fine overnight, including a fair amount of IO
> (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull,
> etc).  This morning I did a swsusp:
> 
>   echo shutdown > /sys/power/disk
>   echo disk > /sys/power/state
> 
> and got a panic along the lines of "Unable to find swap space, try

a panic? it should only be an error message, but the machine should
still be alive.

> swapon -a".  Unfortunately I was in a hurry and didn't record the error
> messages.  I powered off, then a few minutes later powered on again.

Powered off hard or "shutdown -h now"?

> At this point, it resumed *to the swsusp state from yesterday*!
> As soon as I realized what had happened, I powered off (not
> shutdown) and rebooted.

Good.

> On the next boot it did not find a swsusp signature and booted normally;
> ext3 did a normal recovery and seemed OK, but I was suspicious and did a
> fsck -f, which revealed a lot of damage; most of the damage seemed to be

this is expected in this case, unfortunately.

> in the hg repo which had been pulled from www.kernel.org/hg/.
> 
> It's extremely unfortunate that there is *any* failure mode in swsusp
> that can result in this behavior.

I of course won't say that this cannot happen, but by design, the swsusp
signature is invalidated even before reading the image, so theoretically
it should not happen.

> I will try to reproduce, but I'm curious if anyone else has seen this.

i have not seen anything like that, but i am not always running the
latest & greatest kernel.
-- 
Stefan Seyfried  \ "I didn't want to write for pay. I
QA / R Team Mobile Devices  \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, Nürnberg \-- Leonard Cohen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Lee Revell
On Thu, 2005-07-14 at 10:38 +0200, Ingo Molnar wrote:
>  - there are real-time applications (robotic environments: fast rotating
>tools, media and mobile/phone applications, etc.) that want 10
>usecs precision. If such users increased HZ to 100,000 or even
>1000,000, the current timer implementation would start to creek: e.g.
>jiffies on 32-bit systems would wrap around in 11 hours or 1.1 hours.
>(To solve this cleanly, pretty much the only solution seems to be to
>increase the timeout to a 64 bit value. A non-issue for 64-bit
>systems, that's why i think we could eventually look at this 
>possibility, once all the other problems are hashed out.)
> 

Those types of systems will not be 64 bit for many, many years, if
ever...

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] visws: reexport pm_power_off

2005-07-14 Thread Alexey Dobriyan
On Wednesday 13 July 2005 17:38, James Bottomley wrote:
> [PATCH] Remove i386_ksyms.c, almost
> 
> made files like smp.c do their own EXPORT_SYMBOLS.  This means that all
> subarchitectures that override these symbols now have to do the exports
> themselves.  This patch adds the exports for voyager (which is the most
> affected since it has a separate smp harness).  However, someone should
> audit all the other subarchitectures to see if any others got broken.

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 arch/i386/mach-visws/reboot.c |1 +
 1 files changed, 1 insertion(+)

--- linux-vanilla/arch/i386/mach-visws/reboot.c 2005-07-13 19:45:59.0 
+0400
+++ linux-visws/arch/i386/mach-visws/reboot.c   2005-07-14 18:53:23.0 
+0400
@@ -7,6 +7,7 @@
 #include "piix4.h"
 
 void (*pm_power_off)(void);
+EXPORT_SYMBOL(pm_power_off);
 
 void machine_restart(char * __unused)
 {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS corruption on move from xscale to i686

2005-07-14 Thread Christoph Hellwig
On Thu, Jul 14, 2005 at 05:45:15PM +0300, Yura Pakhuchiy wrote:
> Yes, but a lof of people use older versions of compilers and suffer
> from this bug.
> I personally was very unhappy when lost my data.

then host the patch somewhere and make sure to apply it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS corruption on move from xscale to i686

2005-07-14 Thread Yura Pakhuchiy
2005/7/14, Christoph Hellwig <[EMAIL PROTECTED]>:
> On Thu, Jul 14, 2005 at 04:50:01PM +0300, Yura Pakhuchiy wrote:
> > 2005/7/14, Nathan Scott <[EMAIL PROTECTED]>:
> > > On Wed, Jul 13, 2005 at 06:22:28PM +0300, Yura Pakhuchiy wrote:
> > > > I found patch by Greg Ungreger to fix this problem, but why it's still
> > > > not in mainline? Or it's a gcc problem and should be fixed by gcc folks?
> > >
> > > Yes, IIRC the patch was incorrect for other platforms, and it sure
> > > looked like an arm-specific gcc problem (this was ages back, so
> > > perhaps its fixed by now).
> >
> > AFAIR gcc-3.4.3 was released after this conversation take place at 
> > linux-xfs,
> > maybe add something like this:
> >
> > #ifdef XSCALE
> > /* We need this because some gcc versions for xscale are broken. */
> > [patched version here]
> > #else
> > [original version here]
> > #endif
> 
> no, just fix your compiler or let the gcc folks do it.  Did anyone of
> the arm folks ever open a PR at the gcc bugzilla with a reproduced
> testcase?  You're never get your compiler fixed with that attitude.

Yes, but a lof of people use older versions of compilers and suffer
from this bug.
I personally was very unhappy when lost my data.

Best regards,
Yura
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: serial: 8250 fails to detect Exar XR16L2551 correctly

2005-07-14 Thread David Vrabel
Alex Williamson wrote:
> 
> David, would you mind
> trying this on the XR16L255x part? (ie. don't use console=ttyS, use
> console=uart,...)  Thanks,

I wasn't even aware you could do this...

These are the serial ports I have:

ttyS0 at MMIO 0xc800 (irq = 15) is a XScale   IXP425 internal
ttyS1 at MMIO 0xc8001000 (irq = 13) is a XScale "   "
ttyS2 at MMIO 0x5300 (irq = 21) is a XR16550  XR16L2551
ttyS3 at MMIO 0x5308 (irq = 21) is a XR16550  "

I tried console=uart,mmio,0x5300,115200 and my board didn't print
anything to the console and the boot failed somewhere before starting
network (I don't know exactly where or why since I couldn't see any
messages).  Using console=ttyS2,115200 works fine.

What's 8250_early.c for anyway?  console=ttyS... has always worked fine
for me.

David Vrabel
-- 
David Vrabel, Design Engineer

Arcom, Clifton Road   Tel: +44 (0)1223 411200 ext. 3233
Cambridge CB1 7EA, UK Web: http://www.arcom.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS corruption on move from xscale to i686

2005-07-14 Thread Christoph Hellwig
On Thu, Jul 14, 2005 at 04:50:01PM +0300, Yura Pakhuchiy wrote:
> 2005/7/14, Nathan Scott <[EMAIL PROTECTED]>:
> > On Wed, Jul 13, 2005 at 06:22:28PM +0300, Yura Pakhuchiy wrote:
> > > I found patch by Greg Ungreger to fix this problem, but why it's still
> > > not in mainline? Or it's a gcc problem and should be fixed by gcc folks?
> > 
> > Yes, IIRC the patch was incorrect for other platforms, and it sure
> > looked like an arm-specific gcc problem (this was ages back, so
> > perhaps its fixed by now).
> 
> AFAIR gcc-3.4.3 was released after this conversation take place at linux-xfs,
> maybe add something like this:
> 
> #ifdef XSCALE
> /* We need this because some gcc versions for xscale are broken. */
> [patched version here]
> #else
> [original version here]
> #endif

no, just fix your compiler or let the gcc folks do it.  Did anyone of
the arm folks ever open a PR at the gcc bugzilla with a reproduced
testcase?  You're never get your compiler fixed with that attitude.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rocket.c: Fix ldisc ref count handling

2005-07-14 Thread Michal Ostrowski
If bailing out because there is nothing to receive in rp_do_receive(),
tty_ldisc_deref is not called.  Failure to do so increases the ref count 
and causes release_dev() to hang since it can't get the ref count to 0.

---

Signed-off-by: Michal Ostrowski <[EMAIL PROTECTED]>

 drivers/char/rocket.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/char/rocket.c b/drivers/char/rocket.c
--- a/drivers/char/rocket.c
+++ b/drivers/char/rocket.c
@@ -355,7 +355,7 @@ static void rp_do_receive(struct r_port 
ToRecv = space;
 
if (ToRecv <= 0)
-   return;
+   goto done;
 
/*
 * if status indicates there are errored characters in the
@@ -437,6 +437,7 @@ static void rp_do_receive(struct r_port 
}
/*  Push the data up to the tty layer */
ld->receive_buf(tty, tty->flip.char_buf, tty->flip.flag_buf, count);
+ done:
tty_ldisc_deref(ld);
 }
 


pgpj30YGKPzqF.pgp
Description: PGP signature


Re: Thread_Id

2005-07-14 Thread Robert Hancock

RVK wrote:

Ian Campbell wrote:


On Thu, 2005-07-14 at 15:36 +0530, RVK wrote:

 


bits/pthreadtypes.h:150:typedef unsigned long int pthread_t;
  



That's an implementation detail which you cannot determine any
information from.

What Arjan is saying is that pthread_t is a cookie -- this means that
you cannot interpret it in any way, it is just a "thing" which you can
pass back to the API, that pthread_t happens to be typedef'd to unsigned
long int is irrelevant.

 


Do you want to say for both 2.6.x and 2.4.x I should interpret that way ?

rvk


Indeed, for ANY OS using pthreads it should be interpreted that way..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: High irq load (Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt)

2005-07-14 Thread Peter Osterlund
Linus Torvalds <[EMAIL PROTECTED]> writes:

> On Wed, 13 Jul 2005, Jan Engelhardt wrote:
> > 
> > No, some kernel code causes a triple-fault-and-reboot when the HZ is >=
> > 10KHz. Maybe the highest possible value is 8192 Hz, not sure.
> 
> Can you post the triple-fault message? It really shouldn't triple-fault, 
> although it _will_ obviously spend all time just doing timer interrupts, 
> so it shouldn't get much (if any) real work done either.
...
> There should be no conceptual "highest possible HZ", although there are 
> certainly obvious practical limits to it (both on the timer hw itself, and 
> just the fact that at some point we'll spend all time on the timer 
> interrupt and won't get anything done..)

HZ=1 appears to work fine here after some hacks to avoid
over/underflows in integer arithmetics. gkrellm reports about 3-4% CPU
usage when the system is idle, on a 3.07 GHz P4.

---

 Makefile|2 +-
 arch/i386/kernel/cpu/proc.c |6 ++
 fs/nfsd/nfssvc.c|2 +-
 include/linux/jiffies.h |6 ++
 include/linux/nfsd/stats.h  |4 
 include/linux/timex.h   |2 +-
 include/net/tcp.h   |   12 +---
 init/calibrate.c|   21 +
 kernel/Kconfig.hz   |6 ++
 kernel/timer.c  |4 ++--
 net/ipv4/netfilter/ip_conntrack_proto_tcp.c |2 +-
 11 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 13
-EXTRAVERSION =-rc3
+EXTRAVERSION =-rc3-test
 NAME=Woozy Numbat
 
 # *DOCUMENTATION*
diff --git a/arch/i386/kernel/cpu/proc.c b/arch/i386/kernel/cpu/proc.c
--- a/arch/i386/kernel/cpu/proc.c
+++ b/arch/i386/kernel/cpu/proc.c
@@ -128,9 +128,15 @@ static int show_cpuinfo(struct seq_file 
 x86_cap_flags[i] != NULL )
seq_printf(m, " %s", x86_cap_flags[i]);
 
+#if HZ <= 5000
seq_printf(m, "\nbogomips\t: %lu.%02lu\n\n",
 c->loops_per_jiffy/(50/HZ),
 (c->loops_per_jiffy/(5000/HZ)) % 100);
+#else
+   seq_printf(m, "\nbogomips\t: %lu.%02lu\n\n",
+c->loops_per_jiffy/(50/HZ),
+(c->loops_per_jiffy*(HZ/5000)) % 100);
+#endif
 
return 0;
 }
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -160,7 +160,7 @@ update_thread_usage(int busy_threads)
decile = busy_threads*10/nfsdstats.th_cnt;
if (decile>0 && decile <= 10) {
diff = nfsd_last_call - prev_call;
-   if ( (nfsdstats.th_usage[decile-1] += diff) >= NFSD_USAGE_WRAP)
+   if ( (nfsdstats.th_usage[decile-1] += diff) >= NFSD_USAGE_WRAP) 
nfsdstats.th_usage[decile-1] -= NFSD_USAGE_WRAP;
if (decile == 10)
nfsdstats.th_fullcnt++;
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -38,6 +38,12 @@
 # define SHIFT_HZ  9
 #elif HZ >= 768 && HZ < 1536
 # define SHIFT_HZ  10
+#elif HZ >= 1536 && HZ < 3072
+# define SHIFT_HZ  11
+#elif HZ >= 3072 && HZ < 6144
+# define SHIFT_HZ  12
+#elif HZ >= 6144 && HZ < 12288
+# define SHIFT_HZ  13
 #else
 # error You lose.
 #endif
diff --git a/include/linux/nfsd/stats.h b/include/linux/nfsd/stats.h
--- a/include/linux/nfsd/stats.h
+++ b/include/linux/nfsd/stats.h
@@ -30,7 +30,11 @@ struct nfsd_stats {
 };
 
 /* thread usage wraps very million seconds (approx one fortnight) */
+#if HZ < 2048
 #defineNFSD_USAGE_WRAP (HZ*100)
+#else
+#defineNFSD_USAGE_WRAP (2048*100)
+#endif
 
 #ifdef __KERNEL__
 
diff --git a/include/linux/timex.h b/include/linux/timex.h
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -90,7 +90,7 @@
  *
  * FINENSEC is 1 ns in SHIFT_UPDATE units of the time_phase variable.
  */
-#define SHIFT_SCALE 22 /* phase scale (shift) */
+#define SHIFT_SCALE 25 /* phase scale (shift) */
 #define SHIFT_UPDATE (SHIFT_KG + MAXTC) /* time offset scale (shift) */
 #define SHIFT_USEC 16  /* frequency offset scale (shift) */
 #define FINENSEC (1L << (SHIFT_SCALE - 10)) /* ~1 ns in phase units */
diff --git a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -486,8 +486,8 @@ static __inline__ int tcp_sk_listen_hash
so that we select tick to get range about 4 seconds.
  */
 
-#if HZ <= 16 || HZ > 4096
-# error Unsupported: HZ <= 16 or HZ > 4096
+#if HZ <= 16
+# error Unsupported: HZ <= 16
 #elif HZ <= 32
 # define TCP_TW_RECYCLE_TICK (5+2-TCP_TW_RECYCLE_SLOTS_LOG)
 #elif HZ <= 64
@@ -502,8 +502,14 @@ static __inline__ int tcp_sk_listen_hash
 

rcu-refcount stacker performance

2005-07-14 Thread serue
On July 8 I sent out a patch which re-implemented the rcu-refcounting
of the LSM list in stacker for the sake of supporting safe security
module unloading.  (patch reattached here for convenience)  Here are
some performance results with and without that patch.  Tests were run
on a 16-way ppc64 machine.  Dbench was run 50 times, and kernbench
and reaim were run 10 times, and intervals are 95% confidence half-
intervals.

These results seem pretty poor.  I'm now wondering whether this is
really necessary.  David Wheeler's original stacker had an option
of simply waiting a while after a module was taken out of the list
of active modules before freeing the modules.  Something like that
is of course one option.  I'm hoping we can also take advantage of
some already known module state info to be a little less coarse
about it.  For instance, sys_delete_module() sets m->state to
MODULE_STATE_GOING before calling mod->exit().  If in place of
doing atomic_inc(>use), stacker skipped the m->hook() if
m->state!=MODULE_STATE_LIVE, then it may be safe to assume that
any m->hook() should be finished before sys_delete_module() gets
to free_module(mod).  This seems to require adding a struct
module argument to security/security:mod_reg_security() so an LSM
can pass itself along.

So I'll try that next.  Hopefully by avoiding the potential cache
line bounces which atomic_inc(>use) bring, this should provide
far better performance.

thanks,
-serge

Dbench (throughput, larger is better)

plain stacker:1531.448400 +/- 15.791116
stacker with rcu: 1408.056200 +/- 12.597277

Kernbench (runtime, smaller is better)

plain stacker:52.341000  +/- 0.184995
stacker with rcu: 53.722000 +/- 0.161473

Reaim (numjobs, larger is better) (gnuplot-friendly format)
plain stacker:
--
Numforked   jobs/minute 95% CI
1   106662.857000 5354.267865
3   301628.571000 6297.121934
5   488142.858000 16031.685536
7   673200.00 23994.030784
9   852428.57 31485.607271
11  961714.29 0.00
13  1108157.14400027287.525982
15  1171178.57100049790.796869

Reaim (numjobs, larger is better) (gnuplot-friendly format)
plain stacker:
--
Numforked   jobs/minute 95% CI
1   100542.857000 2099.040645
3   266657.139000 6297.121934
5   398892.858000 12023.765252
7   467670.00 14911.383385
9   418648.352000 11665.751441
11  396825.00 8700.115252
13  357480.912000 7567.947838
15  337571.428000 2332.267703

Patch:

Index: linux-2.6.12/security/stacker.c
===
--- linux-2.6.12.orig/security/stacker.c2005-07-08 13:43:15.0 
-0500
+++ linux-2.6.12/security/stacker.c 2005-07-08 16:21:54.0 -0500
@@ -33,13 +33,13 @@
 
 struct module_entry {
struct list_head lsm_list;  /* list of active lsms */
-   struct list_head all_lsms; /* list of all lsms */
char *module_name;
int namelen;
struct security_operations module_operations;
+   struct rcu_head m_rcu;
+   atomic_t use;
 };
 static struct list_head stacked_modules;  /* list of stacked modules */
-static struct list_head all_modules;  /* list of all modules, including freed 
*/
 
 static short sysfsfiles_registered;
 
@@ -84,6 +84,32 @@ MODULE_PARM_DESC(debug, "Debug enabled o
  * We return as soon as an error is returned.
  */
 
+static inline void stacker_free_module(struct module_entry *m)
+{
+   kfree(m->module_name);
+   kfree(m);
+}
+
+/*
+ * Version of stacker_free_module called from call_rcu
+ */
+static void free_mod_fromrcu(struct rcu_head *head)
+{
+   struct module_entry *m;
+
+   m = container_of(head, struct module_entry, m_rcu);
+   stacker_free_module(m);
+}
+
+static void stacker_del_module(struct rcu_head *head)
+{
+   struct module_entry *m;
+   
+   m = container_of(head, struct module_entry, m_rcu);
+   if (atomic_dec_and_test(>use))
+   stacker_free_module(m);
+}
+
 #define stack_for_each_entry(pos, head, member)
\
for (pos = list_entry((head)->next, typeof(*pos), member);  \
>member != (head); \
@@ -93,16 +119,27 @@ MODULE_PARM_DESC(debug, "Debug enabled o
 /* to make this safe for module deletion, we would need to
  * add a reference count to m as we had before
  */
+/*
+ * XXX We can't quite do this - we delete the module before we grab
+ * m->next?
+ * We could just do a call_rcu.  Then the call_rcu happens in same
+ * rcu cycle has dereference, so module won't be deleted until the
+ * next cycle.
+ * That's 

Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Lee Revell
On Thu, 2005-07-14 at 11:24 +0200, Jan Engelhardt wrote:
>  "My expectation is if we want to beat the competition, we'll want
>  the ability to go *under* 100Hz."
> >>> 
> >>> What does Windows do here?
> >>
> >> windows xp base rate is 100Hz... but multimedia apps can ask for almost 
> >
> > 83Hz
> 
> Well, Windoes 98 (vmmon) shows very different ones:

Wow.  Windows has been doing this since *98*?

So that's what Paul meant by "the stupidity of a fixed HZ, which is so
early '90s that its embarrassing".

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-14 Thread K.R. Foley

K.R. Foley wrote:

K.R. Foley wrote:


Karsten Wiese wrote:


Am Mittwoch, 13. Juli 2005 16:01 schrieb K.R. Foley:


Ingo Molnar wrote:


* Chuck Harding <[EMAIL PROTECTED]> wrote:




CC [M]  sound/oss/emu10k1/midi.o
sound/oss/emu10k1/midi.c:48: error: syntax error before 
'__attribute__'

sound/oss/emu10k1/midi.c:48: error: syntax error before ')' token

Here's the offending line:

48 static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));

Lee



I got it to compile but it won't boot - it hangs right after the
'Uncompressing Linux... OK, booting the kernel' - I'm using .config





from 51-27 (attached)





and -51-27 worked just fine? I've uploaded -29 with the -28 io-apic 
changes undone (will re-apply them once Karsten has figured out 
what's wrong).


Ingo




I too had the same problem booting -51-28 on my older SMP system at 
home. -51-29 just booted fine.




Have I corrected the other path of ioapic early initialization, which 
had lacked
virtual-address setup before ioapic_data[ioapic] was to be filled in 
-51-28?

Please test attached patch on top of -51-29 or later.
Also on Systems that liked -51-28.

thanks, Karsten



Karsten,

Just booted on my 2.6 dual Xeon w/HT and thus far all is well. I am 
still building on the older SMP system that didn't like -51-28. Will 
report after I try booting that one.






Just booted on my older SMP box that barfed on -51-28. It would appear 
that the init problem is resolved.




DOH! All of the above is on -51-30 with Karsten's patch applied.

--
   kr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-14 Thread K.R. Foley

K.R. Foley wrote:

Karsten Wiese wrote:


Am Mittwoch, 13. Juli 2005 16:01 schrieb K.R. Foley:


Ingo Molnar wrote:


* Chuck Harding <[EMAIL PROTECTED]> wrote:




CC [M]  sound/oss/emu10k1/midi.o
sound/oss/emu10k1/midi.c:48: error: syntax error before 
'__attribute__'

sound/oss/emu10k1/midi.c:48: error: syntax error before ')' token

Here's the offending line:

48 static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));

Lee



I got it to compile but it won't boot - it hangs right after the
'Uncompressing Linux... OK, booting the kernel' - I'm using .config




from 51-27 (attached)




and -51-27 worked just fine? I've uploaded -29 with the -28 io-apic 
changes undone (will re-apply them once Karsten has figured out 
what's wrong).


Ingo



I too had the same problem booting -51-28 on my older SMP system at 
home. -51-29 just booted fine.




Have I corrected the other path of ioapic early initialization, which 
had lacked
virtual-address setup before ioapic_data[ioapic] was to be filled in 
-51-28?

Please test attached patch on top of -51-29 or later.
Also on Systems that liked -51-28.

thanks, Karsten



Karsten,

Just booted on my 2.6 dual Xeon w/HT and thus far all is well. I am 
still building on the older SMP system that didn't like -51-28. Will 
report after I try booting that one.






Just booted on my older SMP box that barfed on -51-28. It would appear 
that the init problem is resolved.


--
   kr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6] remove PCI_BRIDGE_CTL_VGA handling from setup-bus.c

2005-07-14 Thread Jon Smirl
On 7/14/05, Russell King <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 14, 2005 at 03:53:44PM +0400, Ivan Kokshaysky wrote:
> > The setup-bus code doesn't work correctly for configurations
> > with more than one display adapter in the same PCI domain.
> > This stuff actually is a leftover of an early 2.4 PCI setup code
> > and apparently it stopped working after some "bridge_ctl" changes.
> > So the best thing we can do is just to remove it and rely on the fact
> > that any firmware *has* to configure VGA port forwarding for the boot
> > display device properly.
> 
> What happens when there is no firmware?
> 
> I'm sure this code would not have been added had there not been a reason
> for it.  Do we know why it was added?

I'm don't think it has ever been working in the 2.6 series. If you are
getting rid of it get rid of the #define PCI_BRIDGE_CTL_VGA in pci.h
too since this code was the only user.

Looking at the code as written I don't think it would work on my
machine with multiple VGA devices on different buses. I use the system
BIOS to enable the one I want and it sets up the bridges.

This code is part of VGA arbitration which BenH is addressing with a
more globally comprehensive patch. Ben's code will probably replace
it.

-- 
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Open source firewalls

2005-07-14 Thread RVK

Helge Hafting wrote:


RVK wrote:


Proxies can be a good way of filtering but it can't avoid buffer
overflows.



Yes they can - did you read and udnerstand my previous post at all?
A proxy _can_ avoid a buffer overflow by noticing the
anomalously large data item and simply refuse to pass
it on to the real server!  The proxy can terminate the tcp
connection and throw away the data.

Some of the validations can be done at proxy end. But there are more 
invisible scnarios than the simple visible ones. And its definately much 
preferable to use Apache like stuff then using our ownI hope u 
agree with me...
I don't disagree on proxy doing the filtering and validations what I 
mean to say is it can't garantee avoiding buffer overflows. As it itself 
can be a source for it.



It can only increase it. More code more bugs.



Of course the proxy can be buggy too, but it is easier to
avoid problems there:
1. The server was written to perform a service, perhaps with
   security thrown in later.  (Yes, that's bad design.)
   A firewall proxy is written for security, so buffer overflows
   are usually avoided in the firewall proxy itself.  Because this
   is exactly what the firewall writer is thinking about.
2. The proxy may be much smaller and simpler than the server
   it protects, it is therefore much easier to audit for security
   problems.
3. Fixing the server is indeed best, but not necessarily an option.
   It could be proprietary, or written in a unknown language.

No. As ur the only user of ur program, means resources is limited to 
visulize all senarios for all protocols. No one would like to keep on 
adding the proxies for the sake of buffer overflow. Is basically taken 
as a facility for filtering.



If it is running on a hardware firewall as a service then its more



"Hardware firewall" ???

Yes embedded firewall. When ur gateway is protected by firewall device. 
Another one is a software firewall sol'n.



dangerous as once it is compramised then IDS signatures also can be
deleated :-). No use of IDS the right ?



A compromised firewall is of no use - sure.  So what? That applies
to any firewall, any IDS, or any server for that matter.

No its not true as one ur frewall is compramised, it can effect other 
services also. But at the same time if any of the servers is compramises 
only that server is effected.



So the best way is either make your code free of buffer overflows or



Yes, but the server may not be "my code" at all.  Can't you see that
problem?  It may very well be someone elses code.  I may not have the
source, or the source may be useless for a number of reasons,
such as:
1. being written in a language I don't understand
2. Have a licence that forbids change
3. Need compilers/tools I don't have
4. Being such a nasty mess that writing a proxy is much easier
   than fixing the bloated ill-designed server code one
   unfortunately depends on for the time being.

In these cases, I can still protect my server with a proxy firewall,
although I can't fix the server itself.

Again it will be ur own code with limitation of taking care of all 
scenarios. Take an exampleId we are trying to add a web proxy and 
using apache as our server. Do u say that code written by us will be 
more safe than apache ? :-)



use some library which controls the attack during any buffer overflow
or use Stack Randomisation and Canary based approaches.



A library that controls any buffer overflow doesn't exist at all.


Its there and available. Just need to search.


Stack randomization helps but don't solve all cases, the attacker
simply need code to search for randomly moved parts he need, pad with
a few megabytes of NOPs and things like that.  Of course, a proxy
can easily detect megabytes of NOPs and kill that connection . . .

Its not easy to have an attach with Stack Randomization. Like TCP syn 
randomization.


Regards
rvk


Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-14 Thread K.R. Foley

Karsten Wiese wrote:

Am Mittwoch, 13. Juli 2005 16:01 schrieb K.R. Foley:


Ingo Molnar wrote:


* Chuck Harding <[EMAIL PROTECTED]> wrote:




CC [M]  sound/oss/emu10k1/midi.o
sound/oss/emu10k1/midi.c:48: error: syntax error before '__attribute__'
sound/oss/emu10k1/midi.c:48: error: syntax error before ')' token

Here's the offending line:

48 static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));

Lee



I got it to compile but it won't boot - it hangs right after the
'Uncompressing Linux... OK, booting the kernel' - I'm using .config



from 51-27 (attached)



and -51-27 worked just fine? I've uploaded -29 with the -28 io-apic 
changes undone (will re-apply them once Karsten has figured out what's 
wrong).


Ingo


I too had the same problem booting -51-28 on my older SMP system at 
home. -51-29 just booted fine.




Have I corrected the other path of ioapic early initialization, which had lacked
virtual-address setup before ioapic_data[ioapic] was to be filled in -51-28?
Please test attached patch on top of -51-29 or later.
Also on Systems that liked -51-28.

thanks, Karsten



Karsten,

Just booted on my 2.6 dual Xeon w/HT and thus far all is well. I am 
still building on the older SMP system that didn't like -51-28. Will 
report after I try booting that one.




--
   kr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6] remove PCI_BRIDGE_CTL_VGA handling from setup-bus.c

2005-07-14 Thread Russell King
On Thu, Jul 14, 2005 at 03:53:44PM +0400, Ivan Kokshaysky wrote:
> The setup-bus code doesn't work correctly for configurations
> with more than one display adapter in the same PCI domain.
> This stuff actually is a leftover of an early 2.4 PCI setup code
> and apparently it stopped working after some "bridge_ctl" changes.
> So the best thing we can do is just to remove it and rely on the fact
> that any firmware *has* to configure VGA port forwarding for the boot
> display device properly.

What happens when there is no firmware?

I'm sure this code would not have been added had there not been a reason
for it.  Do we know why it was added?

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread_Id

2005-07-14 Thread RVK


Jakub Jelinek wrote:


On Thu, Jul 14, 2005 at 02:25:43PM +0200, Arjan van de Ven wrote:
 


pure luck. NPTL threading uses it to store a pointer to per thread info
structure; other threading (linuxthreads) may have stored a pid there to
identify the internal thread. nptl is 2.6 only so you might have
switched implementation of threading when you switched kernels.
   



Actually, in linuxthreads what pthread_self () returned has the first slot
in its internal threads array (up to max number of supported threads)
that was unused at thread creation time in the low order bits and sequence
number of thread creation in its high order bits.
So unless you are using yet another threading library (I thought NGPT
is dead for years...), the claim that you get the same numbers from
gettid() syscall under NPTL as pthread_self () gives you under LinuxThreads
is simply not true.  And you certainly shouldn't be using gettid ()
syscall in NPTL, as it is just an implementation detail that there is
a 1:1 mapping between NPTL threads and kernel threads.  It can change
at any time.

 

Which ever is the implementation its expected to be backward compatible. 
Especially thread libraries. As lot of applications using that.


rvk


Jakub

 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fdisk: What do plus signs after "Blocks" mean?

2005-07-14 Thread kernel
I always thought;

First 446 bytes are boot code and all
Next 64 bytes are for 4 partition records, 16 bytes each
Last 2 bytes are signature 

?

-fd


On Wed, 2005-07-13 at 06:24, Jan Engelhardt wrote:
> > Guys, thanks a lot for the explanations!
> >
> > Actually, it seems like one can backup information on ALL partitions
> >by using the command "sfdisk -dx /dev/hdX". Supposedly, it reads not
> >only primary but also extended partitions. "sfdisk -x /dev/hdX" should
> >be then able to write whatever is known back to the disk.
> 
> MBR size is 448 bytes, the rest is "the partition table", with space for four 
> entries. If one wants more, then s/he creates a [primary] partition, tagging 
> it "extended", and the "extended partiton table" is within that primary 
> partition. So yes, by dd'ing /dev/hdX, you get everything. Including "lost 
> sectors" if you dd it back to a bigger HD.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS corruption on move from xscale to i686

2005-07-14 Thread Yura Pakhuchiy
2005/7/14, Nathan Scott <[EMAIL PROTECTED]>:
> On Wed, Jul 13, 2005 at 06:22:28PM +0300, Yura Pakhuchiy wrote:
> > I found patch by Greg Ungreger to fix this problem, but why it's still
> > not in mainline? Or it's a gcc problem and should be fixed by gcc folks?
> 
> Yes, IIRC the patch was incorrect for other platforms, and it sure
> looked like an arm-specific gcc problem (this was ages back, so
> perhaps its fixed by now).

AFAIR gcc-3.4.3 was released after this conversation take place at linux-xfs,
maybe add something like this:

#ifdef XSCALE
/* We need this because some gcc versions for xscale are broken. */
[patched version here]
#else
[original version here]
#endif

Best regards,
Yura
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: serial: 8250 fails to detect Exar XR16L2551 correctly

2005-07-14 Thread Russell King
On Wed, Jul 13, 2005 at 11:04:56AM -0600, Alex Williamson wrote:
> On Mon, 2005-07-11 at 15:17 -0600, Alex Williamson wrote:
> >No, I think this is a problem with the broken A2 UARTs getting
> > confused in serial8250_set_sleep().  If I remove either UART_CAP_SLEEP
> > or UART_CAP_EFR from the capabilities list for this UART, it behaves
> > normally.  Also, just commenting out the UART_CAP_EFR chunks of
> > set_sleep make it behave.  I'll ping Exar for more data.  Thanks,
> 
> Hi Russell,
> 
>I don't know enough about the extended UART programming model, but I
> notice that when UART_CAP_EFR and UART_CAP_SLEEP are set on a UART, we
> set the UART_IERX_SLEEP bit in the UART_IER immediately after it's found
> and configured.

Ah, I see what's happening.  We're detecting the port and doing the
autoconfig.  Then we're checking to see if it's the console, and if
not putting it into low power mode.

Then we try to register the console, which may result in this UART
becoming a console.  So now we have a console which is in low power
mode.  Bad bad bad.  No cookie for the serial layer today.

> Are there known working configs where a UART w/ EFR and SLEEP are
> able to be used as a serial console?

No idea - I'm completely reliant on other folk to report problems
with the 8250 driver with their random versions of UARTs which are
out in the field.  I only have 16450, 16550A and 16750 UARTs here.

Hmm, I need to consider killing register_serial() and the associated
code in serial_core.c earlier so I can sanely fix this problem.  I
think it's time to give the remaining register_serial() users an
extra push... I haven't seen _any_ activity from the remaining users,
so I might have to take the attitude that "if they don't care, I don't
care about breaking their code" which would be rather shameful as far
as the users go.  (but hey, user pressure might wake up the maintainers.)

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-14 Thread K.R. Foley

Ingo Molnar wrote:

* Chuck Harding <[EMAIL PROTECTED]> wrote:


I missed getting -51-29 but just booted up -51-30 and all is well. 
Thanks. Just out of curiosity, what was changed between -51-28, 29, 
and 30?



-51-29 had new IO-APIC optimizations - and i reverted them in -51-30.

Ingo


Ingo,

I just noticed that the keyboard repeat problem is back in a bad way in 
-51-30. I was not seeing this before I left this PC about 16 hours 
ago. And the uptime is:

 08:34:10 up 18:46,  7 users,  load average: 3.32, 3.24, 2.53

--
   kr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2.6] remove PCI_BRIDGE_CTL_VGA handling from setup-bus.c

2005-07-14 Thread Jon Smirl
On 7/14/05, Ivan Kokshaysky <[EMAIL PROTECTED]> wrote:
> The setup-bus code doesn't work correctly for configurations
> with more than one display adapter in the same PCI domain.
> This stuff actually is a leftover of an early 2.4 PCI setup code
> and apparently it stopped working after some "bridge_ctl" changes.
> So the best thing we can do is just to remove it and rely on the fact
> that any firmware *has* to configure VGA port forwarding for the boot
> display device properly.

This fixes my system where the VGA display device is on the second bus.

-- 
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-14 Thread Roman Zippel
Hi,

On Mon, 11 Jul 2005, Andrew Morton wrote:

> > > Hi Andrew, can you please merge relayfs?  It provides a low-overhead
> > > logging and buffering capability, which does not currently exist in
> > > the kernel.
> > 
> > While the code is pretty nicely in shape it seems rather pointless to
> > merge until an actual user goes with it.
> 
> Ordinarily I'd agree.  But this is a bit like kprobes - it's a funny thing
> which other kernel features rely upon, but those features are often ad-hoc
> and aren't intended for merging.

I agree with Christoph, I'd like to see a small (and useful) example 
included, which can be used as reference. relayfs client still need some 
code of their own to communicate with user space. If I look at the example 
code I'm not really sure netlink is a good way to go as control channel.
kprobes has a rather simple interface, relayfs is more complex and I think 
it's a good idea to provide some sane and complete example code to copy 
from.

Looking through the patch there are still a few areas I'm concerned about:
- the usage of atomic_t look a little silly, there is only a single 
writer and probably needs some cache line optimisations
- I would prefer "unsigned int" over just "unsigned"
- the padding/commit arrays can be easily managed by the client
- overwrite mode can be implemented via the buffer switch callback

In general I'm not against merging, but I have a few ideas for further 
cleanups/optimisations and it really would help to have some useful 
example code (e.g. a _simple_ event tracer).

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: moving DRM header files

2005-07-14 Thread Jon Smirl
On 7/14/05, Dave Airlie <[EMAIL PROTECTED]> wrote:
> > > I'm thinking include/linux/drm/
> > > but include/linux would also be possible.
> > >
> > > Any suggestions or ideas?
> >
> > If you're in a mood to move things, how about moving drivers/char/drm
> > to drivers/video/drm.
> 
> But that has little point beyond aesthetics... moving the header files
> is for a reason that I want them to start appearing in userspace
> includeable places.. as part of the cleanup for libdrm..
> 
> Moving c files internally in the kernel provides no real benefit over
> not moving them..

When you start merging DRM and fbdev you will be able to use relative
paths that are closer together.  For example #include
"../char/drm/drmP.h" versus "#include "drm/drmP.h" for internal
headers.

DRM and fbdev need to be moved next to each other in kconfig too if
they start depending on each other. It if hard to figure out that a
video option might not be visible because the char/drm/option is not
turned on.

-- 
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc patch 2/2] direct-io: remove address alignment check

2005-07-14 Thread Andi Kleen
Daniel McNeil <[EMAIL PROTECTED]> writes:

> This patch relaxes the direct i/o alignment check so that user addresses
> do not have to be a multiple of the device block size.

The original reason for this limit was that lots of drivers
(not only IDE) explode when you give them odd sizes. Sometimes
it is even worse.

I doubt all of them have been fixed.

Very risky change.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mb_cache_shrink() frees unexpected caches

2005-07-14 Thread Akinobu Mita
mb_cache_shrink() tries to free all sort of mbcache in the lru list.

All user of mb_cache_shrink() are ext2/ext3 xattr.

Signed-off-by: Akinobu Mita <[EMAIL PROTECTED]>

--- 2.6-rc/fs/mbcache.c.orig2005-07-14 20:40:34.0 +0900
+++ 2.6-rc/fs/mbcache.c 2005-07-14 20:43:42.0 +0900
@@ -329,7 +329,7 @@ mb_cache_shrink(struct mb_cache *cache, 
list_for_each_safe(l, ltmp, _cache_lru_list) {
struct mb_cache_entry *ce =
list_entry(l, struct mb_cache_entry, e_lru_list);
-   if (ce->e_bdev == bdev) {
+   if (ce->e_cache == cache && ce->e_bdev == bdev) {
list_move_tail(>e_lru_list, _list);
__mb_cache_entry_unhash(ce);
}


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread_Id

2005-07-14 Thread Benedikt Spranger
> And you certainly shouldn't be using gettid () syscall in NPTL, as it 
> is just an implementation detail that there is a 1:1 mapping between 
> NPTL threads and kernel threads.  It can change at any time.

Maybe I missed the point, but I thought the 1:1 mapping between NPTL
threads and kernel threads is one of the advantages of NPTL and the idea
of a userland scheduler is quite dead. 

So please let gettid do what man gettid assures:
gettid  returns the thread ID of the current process. This is equal to
the process ID (as returned by getpid(2)), unless the process is  part
of  a thread group (created by specifying the CLONE_THREAD flag to the
clone(2) system call). All processes in the same thread group have the
same PID, but each one has a unique TID.

Bene

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Open source firewalls

2005-07-14 Thread Helge Hafting

RVK wrote:

Proxies can be a good way of filtering but it can't avoid buffer 
overflows. 


Yes they can - did you read and udnerstand my previous post at all?
A proxy _can_ avoid a buffer overflow by noticing the
anomalously large data item and simply refuse to pass
it on to the real server!  The proxy can terminate the tcp
connection and throw away the data.

It can only increase it. More code more bugs. 


Of course the proxy can be buggy too, but it is easier to
avoid problems there:
1. The server was written to perform a service, perhaps with
   security thrown in later.  (Yes, that's bad design.)
   A firewall proxy is written for security, so buffer overflows
   are usually avoided in the firewall proxy itself.  Because this
   is exactly what the firewall writer is thinking about.
2. The proxy may be much smaller and simpler than the server
   it protects, it is therefore much easier to audit for security
   problems.
3. Fixing the server is indeed best, but not necessarily an option.
   It could be proprietary, or written in a unknown language.


If it is running on a hardware firewall as a service then its more


"Hardware firewall" ???

dangerous as once it is compramised then IDS signatures also can be 
deleated :-). No use of IDS the right ?


A compromised firewall is of no use - sure.  So what? That applies
to any firewall, any IDS, or any server for that matter.


So the best way is either make your code free of buffer overflows or


Yes, but the server may not be "my code" at all.  Can't you see that
problem?  It may very well be someone elses code.  I may not have the
source, or the source may be useless for a number of reasons,
such as:
1. being written in a language I don't understand
2. Have a licence that forbids change
3. Need compilers/tools I don't have
4. Being such a nasty mess that writing a proxy is much easier
   than fixing the bloated ill-designed server code one
   unfortunately depends on for the time being.

In these cases, I can still protect my server with a proxy firewall,
although I can't fix the server itself.

use some library which controls the attack during any buffer overflow 
or use Stack Randomisation and Canary based approaches.


A library that controls any buffer overflow doesn't exist at all.

Stack randomization helps but don't solve all cases, the attacker
simply need code to search for randomly moved parts he need, pad with
a few megabytes of NOPs and things like that.  Of course, a proxy
can easily detect megabytes of NOPs and kill that connection . . .

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/RFF][PATCH] rm -rf linux/arch/i386/boot

2005-07-14 Thread Etienne Lorrain
  Hello,

 I am not sure the "rm -rf linux/arch/i386/boot" is an acceptable delta
for the current code management system, but anyways it is only the second
step - after the attached patch has been discussed, modified and
hopefully accepted.

Unfortunately this second step will break compatibility with LILO and
GRUB - the kernel would only boot with Gujin version 1.2 or more
( http://gujin.org ), so we have some time before this cleanning begins,
and have to stay compatible in between.

 In this mean time, the current patch is a complete rewrite of all
the code executed in real mode - and so a complete replacement of all
the directory linux/arch/i386/boot into one C file named
arch/i386/kernel/realmode.c and its include include/asm-i386/realmode.h
 The mapping of the BIOS information reported to the kernel is the same,
the one described in Linux/Documentation/i386/zero-page.txt - but is now
expressed in the form of C structures.

 The kernel file becomes a lot simpler to generate, it is just an ELF
file (the usual file linux/vmlinux you already get during the build
process) transformed into binary by objcopy and gzip'ed. A small part
is added during the link of vmlinux file: the content of realmode.c
which contains a C function that the kernel need to get information
from the BIOS (compiled with GCC / executed in real mode).
 Most of this function is written in C - have a look for yourself.
To generate a kernel:
make /boot/linux-2.6.13.kgz# the root filesystem will be autodetected
make /boot/linux-2.6.13.kgz ROOT=/dev/hda3   # root filesystem forced

 You will need to install Gujin, either on a floppy, on your hard disk
into a partition or at the end of your hard disk, or to a CDROM. No
Configuration of Gujin is needed - because the configuration file does
not (and will never) exists.
 Have a look at Gujin FAQ in the Documentation Manager of sourceforge
before asking, please.

 The attached patch is made based on linux-2.6.13-rc2 , to apply to
linux-2.6.12 you need to modify the patch replacing "phys_startup_32"
by "startup_32" and removing " - LOAD_OFFSET" before applying.
 I will learn GIT soon - but for now...

 The generation process need to insert comments into the GZIP file, in
the field reserved for this purpose, for instance to describe kernel
characteristics like which processor is supported, and so the patch
contains a BSD-licenced tool I wrote named gzcopy for the job.

 Note that there is no more limit in the size of the kernel using
Gujin - but an "all yes config" without any modules will only boot
up to some point: it seems that some (audio card?) driver need
DMA able memory (i.e. below 16 Mbytes) else they crash the kernel.
 Unfortunately by default this "all yes" kernel is loaded at address
1 Mbyte and it is far bigger than 15 Mbytes. A modification of the
load address need some change in the kernel - Gujin is already ready
to load anywhere you want.

 Note also that the x86_64 architecture will need some cleanning too,
but it will only be the third step.

Signed-off-by: Etienne Lorrain <[EMAIL PROTECTED]>

  Have fun [you can check by yourself, it works],
  Etienne.


patch-gujin-2613rc2.gz
Description: application/gzip-compressed


Re: Realtime Preemption, 2.6.12, Beginners Guide?

2005-07-14 Thread Karsten Wiese
Am Mittwoch, 13. Juli 2005 16:01 schrieb K.R. Foley:
> Ingo Molnar wrote:
> > * Chuck Harding <[EMAIL PROTECTED]> wrote:
> > 
> > 
> >>>CC [M]  sound/oss/emu10k1/midi.o
> >>>sound/oss/emu10k1/midi.c:48: error: syntax error before '__attribute__'
> >>>sound/oss/emu10k1/midi.c:48: error: syntax error before ')' token
> >>>
> >>>Here's the offending line:
> >>>
> >>>  48 static DEFINE_SPINLOCK(midi_spinlock __attribute((unused)));
> >>>
> >>>Lee
> >>>
> >>
> >>I got it to compile but it won't boot - it hangs right after the
> >>'Uncompressing Linux... OK, booting the kernel' - I'm using .config
> >>from 51-27 (attached)
> > 
> > 
> > and -51-27 worked just fine? I've uploaded -29 with the -28 io-apic 
> > changes undone (will re-apply them once Karsten has figured out what's 
> > wrong).
> > 
> > Ingo
> 
> I too had the same problem booting -51-28 on my older SMP system at 
> home. -51-29 just booted fine.
> 
Have I corrected the other path of ioapic early initialization, which had lacked
virtual-address setup before ioapic_data[ioapic] was to be filled in -51-28?
Please test attached patch on top of -51-29 or later.
Also on Systems that liked -51-28.

thanks, Karsten
diff -ur ../linux-2.6.12-RT-51-23/arch/i386/kernel/apic.c ./arch/i386/kernel/apic.c
--- ../linux-2.6.12-RT-51-23/arch/i386/kernel/apic.c	2005-07-14 12:31:33.0 +0200
+++ linux-2.6.12-RT/arch/i386/kernel/apic.c	2005-07-14 12:34:53.0 +0200
@@ -832,10 +832,10 @@
 ioapic_phys = (unsigned long)
 	  alloc_bootmem_pages(PAGE_SIZE);
 ioapic_phys = __pa(ioapic_phys);
+set_fixmap_nocache(idx, ioapic_phys);
+printk(KERN_DEBUG "faked IOAPIC to %08lx (%08lx)\n",
+   __fix_to_virt(idx), ioapic_phys);
 			}
-			set_fixmap_nocache(idx, ioapic_phys);
-			printk(KERN_DEBUG "mapped IOAPIC to %08lx (%08lx)\n",
-			   __fix_to_virt(idx), ioapic_phys);
 			idx++;
 		}
 	}
diff -ur ../linux-2.6.12-RT-51-23/arch/i386/kernel/io_apic.c ./arch/i386/kernel/io_apic.c
--- ../linux-2.6.12-RT-51-23/arch/i386/kernel/io_apic.c	2005-07-09 23:49:21.0 +0200
+++ linux-2.6.12-RT/arch/i386/kernel/io_apic.c	2005-07-14 12:34:54.0 +0200
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -55,11 +56,6 @@
 int sis_apic_bug = -1;
 
 /*
- * # of IRQ routing registers
- */
-int nr_ioapic_registers[MAX_IO_APICS];
-
-/*
  * Rough estimation of how many shared IRQs there are, can
  * be changed anytime.
  */
@@ -132,88 +128,74 @@
 # define IOAPIC_CACHE
 #endif
 
-#ifdef IOAPIC_CACHE
-# define MAX_IOAPIC_CACHE 512
 
-/*
- * Cache register values:
- */
-static struct {
-	unsigned int reg;
-	unsigned int val[MAX_IOAPIC_CACHE];
-} io_apic_cache[MAX_IO_APICS]
-		cacheline_aligned_in_smp;
+
+struct ioapic_data_struct {
+	struct sys_device dev;
+	int nr_registers;	//  # of IRQ routing registers
+	volatile unsigned int *base;
+	struct IO_APIC_route_entry *entry;
+#ifdef IOAPIC_CACHE
+	unsigned int reg_set;
+	u32 cached_val[0];
 #endif
+};
 
-volatile unsigned int *io_apic_base[MAX_IO_APICS];
+static struct ioapic_data_struct *ioapic_data[MAX_IO_APICS];
 
-static inline unsigned int __raw_io_apic_read(unsigned int apic, unsigned int reg)
+
+static inline unsigned int __raw_io_apic_read(struct ioapic_data_struct *ioapic, unsigned int reg)
 {
-	volatile unsigned int *io_apic;
-#ifdef IOAPIC_CACHE
-	io_apic_cache[apic].reg = reg;
-#endif
-	io_apic = io_apic_base[apic];
-	io_apic[0] = reg;
-	return io_apic[4];
+# ifdef IOAPIC_CACHE
+	ioapic->reg_set = reg;
+# endif
+	ioapic->base[0] = reg;
+	return ioapic->base[4];
 }
 
-unsigned int raw_io_apic_read(unsigned int apic, unsigned int reg)
+
+# ifdef IOAPIC_CACHE
+static void __init ioapic_cache_init(struct ioapic_data_struct *ioapic)
 {
-	unsigned int val = __raw_io_apic_read(apic, reg);
+	int reg;
+	for (reg = 0; reg < (ioapic->nr_registers + 10); reg++)
+		ioapic->cached_val[reg] = __raw_io_apic_read(ioapic, reg);
+}
+# endif
 
-#ifdef IOAPIC_CACHE
-	io_apic_cache[apic].val[reg] = val;
-#endif
+
+static unsigned int raw_io_apic_read(struct ioapic_data_struct *ioapic, unsigned int reg)
+{
+	unsigned int val = __raw_io_apic_read(ioapic, reg);
+
+# ifdef IOAPIC_CACHE
+	ioapic->cached_val[reg] = val;
+# endif
 	return val;
 }
 
-unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+static unsigned int io_apic_read(struct ioapic_data_struct *ioapic, unsigned int reg)
 {
-#ifdef IOAPIC_CACHE
-	if (unlikely(reg >= MAX_IOAPIC_CACHE)) {
-		static int once = 1;
-
-		if (once) {
-			once = 0;
-			printk("WARNING: ioapic register cache overflow: %d.\n",
-reg);
-			dump_stack();
-		}
-		return __raw_io_apic_read(apic, reg);
-	}
-	if (io_apic_cache[apic].val[reg] && !sis_apic_bug) {
-		io_apic_cache[apic].reg = -1;
-		return io_apic_cache[apic].val[reg];
+# ifdef IOAPIC_CACHE
+	if (likely(!sis_apic_bug)) {
+		ioapic->reg_set = -1;
+		return ioapic->cached_val[reg];
 	}
-#endif
-	return raw_io_apic_read(apic, reg);
+# endif
+	return 

Re: Thread_Id

2005-07-14 Thread Jakub Jelinek
On Thu, Jul 14, 2005 at 02:25:43PM +0200, Arjan van de Ven wrote:
> pure luck. NPTL threading uses it to store a pointer to per thread info
> structure; other threading (linuxthreads) may have stored a pid there to
> identify the internal thread. nptl is 2.6 only so you might have
> switched implementation of threading when you switched kernels.

Actually, in linuxthreads what pthread_self () returned has the first slot
in its internal threads array (up to max number of supported threads)
that was unused at thread creation time in the low order bits and sequence
number of thread creation in its high order bits.
So unless you are using yet another threading library (I thought NGPT
is dead for years...), the claim that you get the same numbers from
gettid() syscall under NPTL as pthread_self () gives you under LinuxThreads
is simply not true.  And you certainly shouldn't be using gettid ()
syscall in NPTL, as it is just an implementation detail that there is
a 1:1 mapping between NPTL threads and kernel threads.  It can change
at any time.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re-routing packets via netfilter (ip_rt_bug)

2005-07-14 Thread Ric Wheeler


Patrick, Hebert,

This issues stills seems to be in the latest trees - is this patch or a 
variation on it still bumping around?


Thanks!

Yair Itzhaki wrote:


Can anyone propose a patch that I can start checking?

I have come up with the following:

--- net/core/netfilter.c.orig   2005-04-18 21:55:30.0 +0300
+++ net/core/netfilter.c2005-05-02 17:35:20.0 +0300
@@ -622,9 +622,10 @@
   /* some non-standard hacks like ipt_REJECT.c:send_reset() can cause
* packets with foreign saddr to appear on the NF_IP_LOCAL_OUT hook.
*/
-   if (inet_addr_type(iph->saddr) == RTN_LOCAL) {
+   if ((inet_addr_type(iph->saddr) == RTN_LOCAL) ||
+   (inet_addr_type(iph->daddr) == RTN_LOCAL)) {
   fl.nl_u.ip4_u.daddr = iph->daddr;
-   fl.nl_u.ip4_u.saddr = iph->saddr;
+   fl.nl_u.ip4_u.saddr = 0;
   fl.nl_u.ip4_u.tos = RT_TOS(iph->tos);
   fl.oif = (*pskb)->sk ? (*pskb)->sk->sk_bound_dev_if : 0;
#ifdef CONFIG_IP_ROUTE_FWMARK

Please advise,
Yair


 


-Original Message-
From: Patrick McHardy [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 27, 2005 14:05
To: Herbert Xu
Cc: Jozsef Kadlecsik; [EMAIL PROTECTED]; 
[EMAIL PROTECTED]; Yair Itzhaki; 
linux-kernel@vger.kernel.org

Subject: Re: Re-routing packets via netfilter (ip_rt_bug)


Herbert Xu wrote:
   


Here is another reason why these packets should go through FORWARD.
They were generated in response to packets in INPUT/FORWARD/OUTPUT.
The original packet has not undergone SNAT in any of these cases.

However, if we feed the response packet through LOCAL_OUT it will
be subject to DNAT.  This creates a NAT asymmetry and we may end
up with the wrong destination address.

By pushing it through FORWARD it will only undergo SNAT which is
correct since the original packet would have undergone DNAT.
 


This is only a problem since the recent NAT changes, but I agree
that we should fix it by moving these packets to FORWARD.

Regards
Patrick

   



 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread_Id

2005-07-14 Thread Arjan van de Ven

> >
> So then what is the meaning of that typedef and why its still there ?

the typedef means that the *IMPLEMENTATION* uses an unsigned long to
store its cookie in.

> 
> >Other implementations are allowed to use different types for this. In
> >fact, I'd be surprised if NPTL and LinuxThreads would have the same
> >type... (they'll have the same size for ABI compat reasons of course,
> >but type... not so sure).
> >
> >  
> >
> I haven't faced the same returns with 2.4.18. So why is it so with 2.6.x 
> kernels ? pthread_self() on 2.4.18 was returning the same as gettid() 
> with 2.6.x.

pure luck. NPTL threading uses it to store a pointer to per thread info
structure; other threading (linuxthreads) may have stored a pid there to
identify the internal thread. nptl is 2.6 only so you might have
switched implementation of threading when you switched kernels.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Open source firewalls

2005-07-14 Thread RVK
Proxies can be a good way of filtering but it can't avoid buffer 
overflows. It can only increase it. More code more bugs. If it is 
running on a hardware firewall as a service then its more dangerous as 
once it is compramised then IDS signatures also can be deleated :-). No 
use of IDS the right ?
So the best way is either make your code free of buffer overflows or use 
some library which controls the attack during any buffer overflow or use 
Stack Randomisation and Canary based approaches.


rvk

Helge Hafting wrote:


RVK wrote:


I don't think buffer overflow has anything to do with transparent
proxy. Transparent proxying is just doing some protocol filtering.



A transparent proxy is a protocol filter, which is why it is an
ideal way of detecting protocol-dependent buffer overflow attacks.

The detection code have to be built into the proxy, of course.

Examples:
A web proxy can check for anomalous long "get" request,
there have been web servers with buffer overflows when the
URL was too long.  The proxy can terminate such connections,
protecting the possibly vulnerable webserver.

An ftp proxy can check for (and remove) anomalous long filenames,
as well as funnies like "ls */*/*/*/*/*"

Similiar for many other services.  The proxy approach is useful
because knowledge of the protocol is necessary.  After all,
it is ok to up/download a huge file via ftp, while a 2M filename
is suspicious.  Size alone is not enough.


Still the proxy code may have some buffer overflows.



A proxy (or any other attempt at a firewall) may have its own
holes of course, but avoiding making them isn't that hard.



The best way is first to try avoiding any buffer overflows and take
programming precautions.



Of course, if you have the source and that source isn't an
unmaintainable mess.  One or both of those conditions may fail,
and then the IDS becomes useful.


Other way is to chroot the services, if running it on a firewall.



Provided it is an unixish server . . .


There are various mechanisms which can be used like bounding the
memory region it self. Stack Randomisation and Canary based approaches
can also avoid any buffer overflow attacks.



These may or may not be available.  You can always stick a proxy
firewall in front of the server though, no matter what os and
server apps it runs.

Helge Hafting
.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Vojtech Pavlik
On Thu, Jul 14, 2005 at 12:25:40PM +0200, Krzysztof Halasa wrote:
> Linus Torvalds <[EMAIL PROTECTED]> writes:
> 
> > And in short-term things, the timeval/jiffie conversion is likely to be a 
> > _bigger_ issue than the crystal frequency conversion.
> >
> > So we should aim for a HZ value that makes it easy to convert to and from
> > the standard user-space interface formats. 100Hz, 250Hz and 1000Hz are all
> > good values for that reason. 864 is not.
> 
> Probably only theoretical, and probably the hardware isn't up to it...
> But what if we have:
> - 64-bit jiffies done in hardware (a counter). 1 cycle = 1 microsecond
>   or even a CPU clock cycle. Can *APIC or another HPET do that?

HPETs have a fixed frequency (usually 14.31818 MHz, but that depends
on the manufacturer).

> - 64-bit "match timer" (i.e., a register in the counter which fires IRQ
>   when it matches the counter value)

That's implemented in the HPET hardware.

> - the CPU(s) sorting the timer list and programming "match timer" with
>   software timer next to be executed. Upon firing the timer, a new "next
>   to be executed" timer would be programmed into the counter's "match
>   timer".
> 
> We would have no timer ticks when nobody requested them - the CPUs would
> be allowed to sleep for, say, even 50 ms when no task is RUNNING.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Open source firewalls

2005-07-14 Thread Helge Hafting

RVK wrote:

I don't think buffer overflow has anything to do with transparent 
proxy. Transparent proxying is just doing some protocol filtering.


A transparent proxy is a protocol filter, which is why it is an
ideal way of detecting protocol-dependent buffer overflow attacks.

The detection code have to be built into the proxy, of course.

Examples:
A web proxy can check for anomalous long "get" request,
there have been web servers with buffer overflows when the
URL was too long.  The proxy can terminate such connections,
protecting the possibly vulnerable webserver.

An ftp proxy can check for (and remove) anomalous long filenames,
as well as funnies like "ls */*/*/*/*/*"

Similiar for many other services.  The proxy approach is useful
because knowledge of the protocol is necessary.  After all,
it is ok to up/download a huge file via ftp, while a 2M filename
is suspicious.  Size alone is not enough.

Still the proxy code may have some buffer overflows. 


A proxy (or any other attempt at a firewall) may have its own
holes of course, but avoiding making them isn't that hard.


The best way is first to try avoiding any buffer overflows and take 
programming precautions. 


Of course, if you have the source and that source isn't an
unmaintainable mess.  One or both of those conditions may fail,
and then the IDS becomes useful.

Other way is to chroot the services, if running it on a firewall. 


Provided it is an unixish server . . .

There are various mechanisms which can be used like bounding the 
memory region it self. Stack Randomisation and Canary based approaches 
can also avoid any buffer overflow attacks.


These may or may not be available.  You can always stick a proxy
firewall in front of the server though, no matter what os and
server apps it runs.

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Vojtech Pavlik
On Thu, Jul 14, 2005 at 09:42:18AM +0200, Arjan van de Ven wrote:

> > IOW, nothing ever sees any "variable frequency", and there's never any 
> > question about what the timer tick is: the timer tick is 2kHz as far as 
> > everybody is concerned. It's just that the ticks sometimes come in 
> > "bunches of 20".
> 
> btw we can hide all of this a lot nicer from just about the entire
> kernel by reducing the usage of both HZ and jiffies in drivers/non
> platform code. That isn't hard; msleep() is a good step forward there
> already; the next step is a nicer api for add_timer/mod_timer that is
> both relative and in miliseconds; with those 2 the majority of code that
> has "knowledge" about this shrinks to near zero. Once we have that the
> actual implementation of this in the background matters a whole lot
> less.
 
A note on the relaive timer API: There needs to be a way to say
"x milliseconds from the time this timer should have triggered" instead
of "x milliseconds from now", to avoid skew in timers that try to be
strictly periodic.

But other than that - such an API would be a great thing for drivers.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6-GIT] NTFS: Release 2.1.23.

2005-07-14 Thread Anton Altaparmakov
Hi Linus, please pull from

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs-2.6.git/HEAD

This is a big NTFS update.  It was meant for as soon as 2.6.12 was released
but it was delayed due to the need for a patch I submitted to Andrew for -mm
to make it to the mainline kernel (which it has as of yesterday).

This update includes lots of fixes including a really nasty deadlock that with
recent kernels was triggered with 100% probability on umount of an NTFS volume
so it is important to go in before 2.6.13 is released.

Please apply.  Thanks!

Best regards,

Anton
-- 
Anton Altaparmakov  (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

This will update the following files:

 Documentation/filesystems/ntfs.txt |   29 +
 fs/ntfs/ChangeLog  |  179 -
 fs/ntfs/Makefile   |4 
 fs/ntfs/aops.c |  166 +---
 fs/ntfs/attrib.c   |  630 +
 fs/ntfs/attrib.h   |   16 
 fs/ntfs/compress.c |   46 +-
 fs/ntfs/debug.c|   15 
 fs/ntfs/dir.c  |   32 -
 fs/ntfs/file.c |2 
 fs/ntfs/index.c|   16 
 fs/ntfs/inode.c|  530 ++--
 fs/ntfs/inode.h|7 
 fs/ntfs/layout.h   |   87 ++--
 fs/ntfs/lcnalloc.c |   72 +--
 fs/ntfs/logfile.c  |   11 
 fs/ntfs/mft.c  |  227 
 fs/ntfs/namei.c|   34 +
 fs/ntfs/ntfs.h |8 
 fs/ntfs/runlist.c  |  278 ++
 fs/ntfs/runlist.h  |   16 
 fs/ntfs/super.c|  692 -
 fs/ntfs/sysctl.c   |4 
 fs/ntfs/time.h |4 
 fs/ntfs/types.h|   10 
 fs/ntfs/unistr.c   |2 
 fs/ntfs/usnjrnl.c  |   84 
 fs/ntfs/usnjrnl.h  |  205 ++
 fs/ntfs/volume.h   |   12 
 29 files changed, 2522 insertions(+), 896 deletions(-)

through these ChangeSets:

commit ba6d2377c85c9b8a793f455d8c9b6cf31985d70f
tree 21e65c76db693869c84864af02e91c4b997a6ba5
parent af859a42d798f047fbfe198ed315a942662c39d2
author Anton Altaparmakov <[EMAIL PROTECTED]> Sun, 26 Jun 2005 22:12:02 +0100
committer Anton Altaparmakov <[EMAIL PROTECTED]> Sun, 26 Jun 2005 22:12:02 +0100

NTFS: Fix a nasty deadlock that appeared in recent kernels.
The situation: VFS inode X on a mounted ntfs volume is dirty.  For
same inode X, the ntfs_inode is dirty and thus corresponding on-disk
inode, i.e. mft record, which is in a dirty PAGE_CACHE_PAGE belonging
to the table of inodes, i.e. $MFT, inode 0.
What happens:
Process 1: sys_sync()/umount()/whatever...  calls
__sync_single_inode() for $MFT -> do_writepages() -> write_page for
the dirty page containing the on-disk inode X, the page is now locked
-> ntfs_write_mst_block() which clears PageUptodate() on the page to
prevent anyone else getting hold of it whilst it does the write out.
This is necessary as the on-disk inode needs "fixups" applied before
the write to disk which are removed again after the write and
PageUptodate is then set again.  It then analyses the page looking
for dirty on-disk inodes and when it finds one it calls
ntfs_may_write_mft_record() to see if it is safe to write this
on-disk inode.  This then calls ilookup5() to check if the
corresponding VFS inode is in icache().  This in turn calls ifind()
which waits on the inode lock via wait_on_inode whilst holding the
global inode_lock.
Process 2: pdflush results in a call to __sync_single_inode for the
same VFS inode X on the ntfs volume.  This locks the inode (I_LOCK)
then calls write-inode -> ntfs_write_inode -> map_mft_record() ->
read_cache_page() for the page (in page cache of table of inodes
$MFT, inode 0) containing the on-disk inode.  This page has
PageUptodate() clear because of Process 1 (see above) so
read_cache_page() blocks when it tries to take the page lock for the
page so it can call ntfs_read_page().
Thus Process 1 is holding the page lock on the page containing the
on-disk inode X and it is waiting on the inode X to be unlocked in
ifind() so it can write the page out and then unlock the page.
And Process 2 is holding the inode lock on inode X and is waiting for
the page to be unlocked so it can call ntfs_readpage() or discover
that Process 1 set PageUptodate() again and use the page.
Thus we have a deadlock due to ifind() waiting on the inode lock.
The solution: The fix is 

Re: GIT tree broken? (rsync depreciated)

2005-07-14 Thread Christian Kujau
Stelian Pop schrieb:
> After resyncing cogito to the latest version (which incorporates the
> 'pack' changes, which were causing the failure), it does indeed work
> again, when using rsync.
> 

hm, i haven't updated my git-tree (linux-2.6.git) for a while and i got
similiar error messages. i updated cogito to:

% cg-version
cogito-0.12.1 (cbec08d191d36126ddaf021961cc8995794b4a72)

and the "cannot map sha1 file..." errors went away. now i get:

Applying changes...
error: Could not read 043d051615aa5da09a7e44f1edbb69798458e067
error: Could not read 043d051615aa5da09a7e44f1edbb69798458e067
error: Could not read c101f3136cc98a003d0d16be6fab7d0d950581a6
error: Could not read c101f3136cc98a003d0d16be6fab7d0d950581a6
error: Could not read c101f3136cc98a003d0d16be6fab7d0d950581a6
error: Could not read a18bcb7450840f07a772a45229de4811d930f461
Merging 99f95e5286df2f69edab8a04c7080d986ee4233b ->
514fd7fd01d378a7b5584c657d9807fc28f22079
to 62351cc38d3eaf3de0327054dd6ebc039f4da80d...
fatal: failed to unpack tree object bda3910b7737a4fac464792657ffedcba185d799
cg-merge: git-read-tree failed (merge likely blocked by local changes)

i *think* i did not make any local changes, but if i did - i want to get
rid ofthem and want a clean tree.

cg-status prints a lot of files with a "D" in front of it but "cg-status
-h" does not know about the "D" status flag

any hints for this one?

thank you,
Christian.
-- 
BOFH excuse #378:

Operators killed by year 2000 bug bite.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2.6] remove PCI_BRIDGE_CTL_VGA handling from setup-bus.c

2005-07-14 Thread Ivan Kokshaysky
The setup-bus code doesn't work correctly for configurations
with more than one display adapter in the same PCI domain.
This stuff actually is a leftover of an early 2.4 PCI setup code
and apparently it stopped working after some "bridge_ctl" changes.
So the best thing we can do is just to remove it and rely on the fact
that any firmware *has* to configure VGA port forwarding for the boot
display device properly.

But then we need to ensure that the bus->bridge_ctl will always
contain valid information collected at the probe time, therefore
the following change in pci_scan_bridge() is needed.

Signed-off-by: Ivan Kokshaysky <[EMAIL PROTECTED]>

--- 2.6.13-rc3/drivers/pci/probe.c  Thu Jul 14 11:09:52 2005
+++ linux/drivers/pci/probe.c   Thu Jul 14 11:22:06 2005
@@ -507,7 +507,7 @@ int __devinit pci_scan_bridge(struct pci
pci_write_config_dword(dev, PCI_PRIMARY_BUS, buses);
 
if (!is_cardbus) {
-   child->bridge_ctl = PCI_BRIDGE_CTL_NO_ISA;
+   child->bridge_ctl = bctl | PCI_BRIDGE_CTL_NO_ISA;
/*
 * Adjust subordinate busnr in parent buses.
 * We do this before scanning for children because
--- 2.6.13-rc3/drivers/pci/setup-bus.c  Thu Jul 14 11:09:52 2005
+++ linux/drivers/pci/setup-bus.c   Thu Jul 14 11:22:54 2005
@@ -51,8 +51,6 @@ pbus_assign_resources_sorted(struct pci_
struct resource_list head, *list, *tmp;
int idx;
 
-   bus->bridge_ctl &= ~PCI_BRIDGE_CTL_VGA;
-
head.next = NULL;
list_for_each_entry(dev, >devices, bus_list) {
u16 class = dev->class >> 8;
@@ -62,10 +60,6 @@ pbus_assign_resources_sorted(struct pci_
class == PCI_CLASS_BRIDGE_HOST)
continue;
 
-   if (class == PCI_CLASS_DISPLAY_VGA ||
-   class == PCI_CLASS_NOT_DEFINED_VGA)
-   bus->bridge_ctl |= PCI_BRIDGE_CTL_VGA;
-
pdev_sort_resources(dev, );
}
 
@@ -509,12 +503,6 @@ pci_bus_assign_resources(struct pci_bus 
 
pbus_assign_resources_sorted(bus);
 
-   if (bus->bridge_ctl & PCI_BRIDGE_CTL_VGA) {
-   /* Propagate presence of the VGA to upstream bridges */
-   for (b = bus; b->parent; b = b->parent) {
-   b->bridge_ctl |= PCI_BRIDGE_CTL_VGA;
-   }
-   }
list_for_each_entry(dev, >devices, bus_list) {
b = dev->subordinate;
if (!b)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console remains blanked

2005-07-14 Thread Jan Engelhardt

>Before 2.6.12-rc2, the console was unblanked by just
>writing to the console.
>For keyboardless and mouseless systems (which is my
>case, embedded) this new behaviour is a bit annoying.

Interesting. I have observed the following (2.6.13-rc1 and a little 
earlier):
mplayer bla.avi -vo cvidix
After the blanking time, all chars turn black[1] but are still "visible" 
thanks the movie in the background - a vga palette manipulation to the entries 
0-15 as it seems. This is quite different to writing 80x25 the space character.



Jan Engelhardt
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Patch to make mount follow a symlink at /etc/mtab

2005-07-14 Thread Thomas Hood

I attach a patch that modifies the mount program in the util-linux
package so that if /etc/mtab is a symbolic link (to a location outside
of /proc) then mount accesses mtab at the target of the symbolic link.

This feature is useful when the root filesystem is mounted read-only;
/etc/mtab can then be symlinked to a location on a writable filesystem.
In the long run mtab should be eliminated entirely but in the meantime
it is nice to be able to relocate the file.

The patch deals correctly with the fact that mount creates lock files
in the same directory as mtab.

This patch also fixes a bug in umount.c whereby umount will update mtab
in some circumstances even though the -n option has been given.

I wrote the patch in August 2003 and submitted it to the util-linux
maintainer at that time.  He said that he would apply it if it proved
to be reliable after some testing.  The patch has been on my web page

http://panopticon.csustan.edu/thood/readonly-root.html

for almost two years since then, updated from time to time as new
versions of util-linux were released.  I have advertised the patch in
various forums and I have used the patch myself for a long time.  No
problems have ever been reported.

The latest version of the patch applies to versions 2.12p and 2.12q.

util-linux-2.12q-symlinkmtab_jdth20050709.patch

I tested it by patching the latest Debian and Ubuntu packages.  In
order for the latter to build I had to modify 10fstab.dpatch as well.
I attach the patch for that file too.

util-linux-2.12q-symlinkmtab-10fstab_jdth20050709.patch

--
Thomas Hood
diff -uNr util-linux-2.12p_ORIG/mount/fstab.c util-linux-2.12p/mount/fstab.c
--- util-linux-2.12p_ORIG/mount/fstab.c	2004-12-21 20:09:24.0 +0100
+++ util-linux-2.12p/mount/fstab.c	2005-07-09 11:53:19.0 +0200
@@ -1,7 +1,10 @@
-/* 1999-02-22 Arkadiusz Mi¶kiewicz <[EMAIL PROTECTED]>
+/*
+ * 1999-02-22 Arkadiusz Mi¶kiewicz <[EMAIL PROTECTED]>
  * - added Native Language Support
- * Sun Mar 21 1999 - Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
+ * 1999-03-21 Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
  * - fixed strerr(errno) in gettext calls
+ * 2003-08-08 Thomas Hood <[EMAIL PROTECTED]> with help from Patrick McLean
+ * - Write through a symlink at /etc/mtab if it doesn't point into /proc/
  */
 
 #include 
@@ -11,67 +14,129 @@
 #include 
 #include "mntent.h"
 #include "fstab.h"
+#include "realpath.h"
 #include "sundries.h"
 #include "xmalloc.h"
 #include "mount_blkid.h"
 #include "paths.h"
 #include "nls.h"
 
-#define streq(s, t)	(strcmp ((s), (t)) == 0)
-
-#define PROC_MOUNTS		"/proc/mounts"
-
-
 /* Information about mtab. */
-static int have_mtab_info = 0;
-static int var_mtab_does_not_exist = 0;
-static int var_mtab_is_a_symlink = 0;
+/* A 64 bit number can be displayed in 20 decimal digits */
+#define LEN_LARGEST_PID 20
+#define MTAB_PATH_MAX (PATH_MAX - (sizeof(MTAB_LOCK_SUFFIX) - 1) - LEN_LARGEST_PID)
+static char mtab_path[MTAB_PATH_MAX];
+static char mtab_lock_path[PATH_MAX];
+static char mtab_lock_targ[PATH_MAX];
+static char mtab_temp_path[PATH_MAX];
 
-static void
+/*
+ * Set mtab_path to the real path of the mtab file
+ * or to the null string if that path is inaccessible
+ *
+ * Run this early
+ */
+void
 get_mtab_info(void) {
 	struct stat mtab_stat;
 
-	if (!have_mtab_info) {
-		if (lstat(MOUNTED, _stat))
-			var_mtab_does_not_exist = 1;
-		else if (S_ISLNK(mtab_stat.st_mode))
-			var_mtab_is_a_symlink = 1;
-		have_mtab_info = 1;
+	if (lstat(MOUNTED, _stat)) {
+		/* Assume that the lstat error means that the file does not exist */
+		/* (Maybe we should check errno here) */
+		strcpy(mtab_path, MOUNTED);
+	} else if (S_ISLNK(mtab_stat.st_mode)) {
+		/* Is a symlink */
+		int len;
+		char *r = myrealpath(MOUNTED, mtab_path, MTAB_PATH_MAX);
+		mtab_path[MTAB_PATH_MAX - 1] = 0; /* Just to be sure */
+		len = strlen(mtab_path);
+		if (
+			r == NULL
+			|| len == 0
+			|| len >= (MTAB_PATH_MAX - 1)
+			|| streqn(mtab_path, PATH_PROC, sizeof(PATH_PROC) - 1)
+		) {
+			/* Real path invalid or inaccessible */
+			mtab_path[0] = '\0';
+			return;
+		}
+		/* mtab_path now contains mtab's real path */
+	} else {
+		/* Exists and is not a symlink */
+		strcpy(mtab_path, MOUNTED);
 	}
+
+	sprintf(mtab_lock_path, "%s%s", mtab_path, MTAB_LOCK_SUFFIX);
+	sprintf(mtab_lock_targ, "%s%s%d", mtab_path, MTAB_LOCK_SUFFIX, getpid());
+	sprintf(mtab_temp_path, "%s%s", mtab_path, MTAB_TEMP_SUFFIX);
+
+	return;
 }
 
-int
-mtab_does_not_exist(void) {
-	get_mtab_info();
-	return var_mtab_does_not_exist;
+/*
+ * Tell whether or not the mtab real path is accessible
+ *
+ * get_mtab_info() must have been run
+ */
+static int
+mtab_is_accessible(void) {
+	return (mtab_path[0] != '\0');
 }
 
+/*
+ * Tell whether or not the mtab file currently exists
+ *
+ * Note that the answer here is independent of whether or
+ * not the file is writable, so if you are planning to create
+ * the mtab file then check 

Re: Thread_Id

2005-07-14 Thread RVK

Ian Campbell wrote:


On Thu, 2005-07-14 at 16:32 +0530, RVK wrote:
 


Ian Campbell wrote:
   


What Arjan is saying is that pthread_t is a cookie -- this means that
you cannot interpret it in any way, it is just a "thing" which you can
pass back to the API, that pthread_t happens to be typedef'd to unsigned
long int is irrelevant.
 


Do you want to say for both 2.6.x and 2.4.x I should interpret that way ?
   



As I understand it, yes, you should never try and assign any meaning to
the values. The fact that you may have been able to find some apparent
meaning under 2.4 is just a coincidence.

 

Iam sorry I don't agree on this. This confusion have created only becoz 
of the different behavior of pthread_self() on 2.4.18 and 2.6.x kernels. 
And Iam looking for clarifying my doubt. I can't digest this at all.


rvk


Ian.

--
Ian Campbell
Current Noise: Nile - Annihilation Of The Wicked

BOFH excuse #127:

Sticky bits on disk.
.

 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread_Id

2005-07-14 Thread RVK

Arjan van de Ven wrote:


On Thu, 2005-07-14 at 15:36 +0530, RVK wrote:

 


it doesn't return a number it returns a pointer ;) or a floating point
number. You don't know :)

what it returns is a *cookie*. A cookie that you can only use to pass
back to various pthread functions.



 


Hahaha..common. Please clarify following
   



I'm missing the joke

 


Its not a joke its a confusion created by the thread identifier.


SYNOPSIS
  #include 

  pthread_t pthread_self(void);

DESCRIPTION
  pthread_self return the thread identifier for the calling thread.
   



*identifier*.
It doesn't give a meaning beyond that, and if you look at other pthread
manpages (say pthread_join) it just wants that identifier back. If you
want to attach meaning to a thread identifier, please come up with a
manpage/standard that actually defines the meaning of it.

 


bits/pthreadtypes.h:150:typedef unsigned long int pthread_t;
   



and here you
1) look at implementation details of your specific threading
implementation and
2) you prove that your analysis is wrong since the implementation you
look at defines it as *unsigned* so it can't be negative. So what your
app does is clearly wrong even within the implementation you look at.


 


So then what is the meaning of that typedef and why its still there ?


Other implementations are allowed to use different types for this. In
fact, I'd be surprised if NPTL and LinuxThreads would have the same
type... (they'll have the same size for ABI compat reasons of course,
but type... not so sure).

 

I haven't faced the same returns with 2.4.18. So why is it so with 2.6.x 
kernels ? pthread_self() on 2.4.18 was returning the same as gettid() 
with 2.6.x.


rvk


.

 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread_Id

2005-07-14 Thread Ian Campbell
On Thu, 2005-07-14 at 16:32 +0530, RVK wrote:
> Ian Campbell wrote:
> >What Arjan is saying is that pthread_t is a cookie -- this means that
> >you cannot interpret it in any way, it is just a "thing" which you can
> >pass back to the API, that pthread_t happens to be typedef'd to unsigned
> >long int is irrelevant.
> Do you want to say for both 2.6.x and 2.4.x I should interpret that way ?

As I understand it, yes, you should never try and assign any meaning to
the values. The fact that you may have been able to find some apparent
meaning under 2.4 is just a coincidence.

Ian.

-- 
Ian Campbell
Current Noise: Nile - Annihilation Of The Wicked

BOFH excuse #127:

Sticky bits on disk.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread_Id

2005-07-14 Thread RVK

Ian Campbell wrote:


On Thu, 2005-07-14 at 15:36 +0530, RVK wrote:

 


bits/pthreadtypes.h:150:typedef unsigned long int pthread_t;
   



That's an implementation detail which you cannot determine any
information from.

What Arjan is saying is that pthread_t is a cookie -- this means that
you cannot interpret it in any way, it is just a "thing" which you can
pass back to the API, that pthread_t happens to be typedef'd to unsigned
long int is irrelevant.

 


Do you want to say for both 2.6.x and 2.4.x I should interpret that way ?

rvk


Ian.

--
Ian Campbell
Current Noise: Nile - Annihilation Of The Wicked

Don't tell me what you dreamed last night for I've been reading Freud.
.

 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] V4L: Bug fixes at tuner, cx88 and tea5767 (against 2.6.13-rc3)

2005-07-14 Thread Mauro Carvalho Chehab
- Bug fixes:
  1) On CX88 code, some cards needs to have audio reprogramed after 
changing video channel;
  2) Tuner autodetection code seems not to work on some cards. Now, 
no_autodetect insmod option allows disabling autodetection code;
  3) Minor fixes at tea5767 to reduce integer trunc;
  4) There are some new Pixelview Ultra Pro cards that doesn't use TEA5767 
for radio. As autodetection is capable of checking for tea, radio tuners 
and addresses removed.

- CX88 version number incremented.

Signed-off-by: Mauro Carvalho Chehab <[EMAIL PROTECTED]>

 linux/drivers/media/video/cx88/cx88-cards.c |8 ++--
 linux/drivers/media/video/cx88/cx88-dvb.c   |2 -
 linux/drivers/media/video/cx88/cx88-video.c |7 +++-
 linux/drivers/media/video/cx88/cx88.h   |6 +--
 linux/drivers/media/video/tea5767.c |   34 +++-
 linux/drivers/media/video/tuner-core.c  |   31 +++---
 6 files changed, 52 insertions(+), 36 deletions(-)

diff -u linux-2.6.13/drivers/media/video/cx88/cx88.h linux/drivers/media/video/cx88/cx88.h
--- linux-2.6.13/drivers/media/video/cx88/cx88.h	2005-07-13 11:07:25.0 -0300
+++ linux/drivers/media/video/cx88/cx88.h	2005-07-14 07:32:17.0 -0300
@@ -1,5 +1,5 @@
 /*
- * $Id: cx88.h,v 1.68 2005/07/07 14:17:47 mchehab Exp $
+ * $Id: cx88.h,v 1.69 2005/07/13 17:25:25 mchehab Exp $
  *
  * v4l2 device driver for cx2388x based TV cards
  *
@@ -35,8 +35,8 @@
 #include "btcx-risc.h"
 #include "cx88-reg.h"
 
-#include 
-#define CX88_VERSION_CODE KERNEL_VERSION(0,0,4)
+#include 
+#define CX88_VERSION_CODE KERNEL_VERSION(0,0,5)
 
 #ifndef TRUE
 # define TRUE (1==1)
diff -u linux-2.6.13/drivers/media/video/cx88/cx88-cards.c linux/drivers/media/video/cx88/cx88-cards.c
--- linux-2.6.13/drivers/media/video/cx88/cx88-cards.c	2005-07-13 11:07:25.0 -0300
+++ linux/drivers/media/video/cx88/cx88-cards.c	2005-07-14 07:32:17.0 -0300
@@ -1,5 +1,5 @@
 /*
- * $Id: cx88-cards.c,v 1.85 2005/07/04 19:35:05 mkrufky Exp $
+ * $Id: cx88-cards.c,v 1.86 2005/07/14 03:06:43 mchehab Exp $
  *
  * device driver for Conexant 2388x based TV cards
  * card-specific stuff.
@@ -682,9 +682,9 @@
 		.name   = "PixelView PlayTV Ultra Pro (Stereo)",
 		/* May be also TUNER_YMEC_TVF_5533MF for NTSC/M or PAL/M */
 		.tuner_type = TUNER_PHILIPS_FM1216ME_MK3,
-		.radio_type = TUNER_TEA5767,
-		.tuner_addr	= 0xc2>>1,
-		.radio_addr	= 0xc0>>1,
+		.radio_type = UNSET,
+		.tuner_addr	= ADDR_UNSET,
+		.radio_addr	= ADDR_UNSET,
 		.input  = {{
 			.type   = CX88_VMUX_TELEVISION,
 			.vmux   = 0,
diff -u linux-2.6.13/drivers/media/video/cx88/cx88-video.c linux/drivers/media/video/cx88/cx88-video.c
--- linux-2.6.13/drivers/media/video/cx88/cx88-video.c	2005-07-13 11:07:25.0 -0300
+++ linux/drivers/media/video/cx88/cx88-video.c	2005-07-14 07:32:17.0 -0300
@@ -1,5 +1,5 @@
 /*
- * $Id: cx88-video.c,v 1.79 2005/07/07 14:17:47 mchehab Exp $
+ * $Id: cx88-video.c,v 1.80 2005/07/13 08:49:08 mchehab Exp $
  *
  * device driver for Conexant 2388x based TV cards
  * video4linux video interface
@@ -1346,6 +1346,11 @@
 		dev->freq = f->frequency;
 		cx88_newstation(core);
 		cx88_call_i2c_clients(dev->core,VIDIOC_S_FREQUENCY,f);
+
+		/* When changing channels it is required to reset TVAUDIO */
+		msleep (10);
+		cx88_set_tvaudio(core);
+
 		up(>lock);
 		return 0;
 	}
diff -u linux-2.6.13/drivers/media/video/cx88/cx88-dvb.c linux/drivers/media/video/cx88/cx88-dvb.c
--- linux-2.6.13/drivers/media/video/cx88/cx88-dvb.c	2005-07-13 11:07:25.0 -0300
+++ linux/drivers/media/video/cx88/cx88-dvb.c	2005-07-14 07:32:17.0 -0300
@@ -1,5 +1,5 @@
 /*
- * $Id: cx88-dvb.c,v 1.41 2005/07/04 19:35:05 mkrufky Exp $
+ * $Id: cx88-dvb.c,v 1.42 2005/07/12 15:44:55 mkrufky Exp $
  *
  * device driver for Conexant 2388x based TV cards
  * MPEG Transport Stream (DVB) routines
diff -u linux-2.6.13/drivers/media/video/tuner-core.c linux/drivers/media/video/tuner-core.c
--- linux-2.6.13/drivers/media/video/tuner-core.c	2005-07-13 11:07:25.0 -0300
+++ linux/drivers/media/video/tuner-core.c	2005-07-14 07:32:17.0 -0300
@@ -1,5 +1,5 @@
 /*
- * $Id: tuner-core.c,v 1.55 2005/07/08 13:20:33 mchehab Exp $
+ * $Id: tuner-core.c,v 1.58 2005/07/14 03:06:43 mchehab Exp $
  *
  * i2c tv tuner chip device driver
  * core core, i.e. kernel interfaces, registering and so on
@@ -39,6 +39,9 @@
 static unsigned int addr = 0;
 module_param(addr, int, 0444);
 
+static unsigned int no_autodetect = 0;
+module_param(no_autodetect, int, 0444);
+
 /* insmod options used at runtime => read/write */
 unsigned int tuner_debug = 0;
 module_param(tuner_debug, int, 0644);
@@ -318,17 +321,19 @@
 	tuner_info("chip found @ 0x%x (%s)\n", addr << 1, adap->name);
 
 	/* TEA5767 autodetection code - only for addr = 0xc0 */
-	if (addr == 0x60) {
-		if (tea5767_autodetection(>i2c) != EINVAL) {
-			t->type = TUNER_TEA5767;
-			t->mode_mask = T_RADIO;
-			t->mode = T_STANDBY;
-			

Re: [patch 2.6.13-git] 8250 tweaks

2005-07-14 Thread Russell King
On Thu, Jul 14, 2005 at 12:12:02AM -0700, Sam Song wrote:
> It turned out the conflict of uart init definition 
> like MPC10X_UART0_IRQ in ../syslib/mpc10x_common.c 
> and SERIAL_PORT_DFNS in ../platform/sandpoint.h. By
> now, only MPC10X_UART0_IRQ stuff is needed. 
> SERIAL_PORT_DFNS should be omitted. 

Oh dear, it seems that I missed a load of fixups then.  I only
scanned include/asm-* for SERIAL_PORT_DFNS - and I stupidly
thought that PPC this "platform" directory would be in include/asm-ppc.

> Seems it's time for me to stand with Russell:-)

Well, in this case, the "whinging" resulted in finding a _real_ bug
and locating why your ports weren't being found.  So I guess it's
good for something.

Can you mail me a diff of the changes you made to
arch/ppc/platforms/sandpoint.h please?  If that file is being used
it seems that you actually have 4 ports defined in total.  However,
I'm a little confused because the sandpoint.h defines don't seem to
match your original dmesg output.

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread_Id

2005-07-14 Thread Ian Campbell
On Thu, 2005-07-14 at 15:36 +0530, RVK wrote:

> bits/pthreadtypes.h:150:typedef unsigned long int pthread_t;

That's an implementation detail which you cannot determine any
information from.

What Arjan is saying is that pthread_t is a cookie -- this means that
you cannot interpret it in any way, it is just a "thing" which you can
pass back to the API, that pthread_t happens to be typedef'd to unsigned
long int is irrelevant.  

Ian.

-- 
Ian Campbell
Current Noise: Nile - Annihilation Of The Wicked

Don't tell me what you dreamed last night for I've been reading Freud.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.13-rc3 ACPI regression and hang on x86-64

2005-07-14 Thread Mikael Pettersson
On my x86-64 laptop (Targa Visionary 811: Athlon64 + VIA chipset,
Arima OEM:d HW also sold by eMachines and others), ACPI is broken
and hangs the x86-64 2.6.13-rc3 kernel.

During boot, ACPI reduces the screen's brightness (it's always
done this in the x86-64 kernels but not the i386 ones), so I
have to press a specific key combination (Fn+F8) to increase the
brightness. This worked up to and including the 2.6.13-rc2 kernel,
but with 2.6.13-rc3 it causes an error message:

acpi_ec-0217 [04] acpi_ec_leave_burst_mo: --->status fail

on the console, and then the machine is hung hard.

With the i386 kernel, both this key combination and the other one
for reducing the brightness work as expected.

A diff between the dmesg logs for 2.6.13-rc2 and -rc3 (included below)
indicates that APCI experiences several new errors in rc3.

/Mikael

--- dmesg-2.6.13-rc2-x86_64 2005-07-14 11:59:58.0 +0200
+++ dmesg-2.6.13-rc3-x86_64 2005-07-14 11:59:59.0 +0200
@@ -1,5 +1,5 @@
 Bootdata ok (command line is ro root=/dev/hda7)
-Linux version 2.6.13-rc2 ([EMAIL PROTECTED]) (gcc version 4.0.1) #1 Fri Jul 8 
15:44:53 CEST 2005
+Linux version 2.6.13-rc3 ([EMAIL PROTECTED]) (gcc version 4.0.1) #1 Wed Jul 13 
17:51:48 CEST 2005
 BIOS-provided physical RAM map:
  BIOS-e820:  - 0009f800 (usable)
  BIOS-e820: 0009f800 - 000a (reserved)
@@ -37,46 +37,49 @@
 Initializing CPU#0
 PID hash table entries: 2048 (order: 11, 65536 bytes)
 time.c: Using 1.193182 MHz PIT timer.
-time.c: Detected 1603.693 MHz processor.
+time.c: Detected 1603.705 MHz processor.
 time.c: Using PIT/TSC based timekeeping.
 Console: colour VGA+ 80x25
 Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
 Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
-Memory: 511408k/523200k available (1653k kernel code, 11012k reserved, 941k 
data, 128k init)
-Calibrating delay using timer specific routine.. 3211.68 BogoMIPS 
(lpj=16058428)
+Memory: 511408k/523200k available (1656k kernel code, 11012k reserved, 941k 
data, 128k init)
+Calibrating delay using timer specific routine.. 3211.67 BogoMIPS 
(lpj=16058383)
 Mount-cache hash table entries: 256
 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
 CPU: L2 Cache: 1024K (64 bytes/line)
+mtrr: v2.0 (20020519)
 CPU: Mobile AMD Athlon(tm) 64 Processor 2800+ stepping 0a
- tbxface-0118 [02] acpi_load_tables  : ACPI Tables successfully acquired
+ tbxface-0120 [02] acpi_load_tables  : ACPI Tables successfully acquired
 Parsing all Control 
Methods:
 Table [DSDT](id F005) - 482 Objects with 46 Devices 148 Methods 16 Regions
 Parsing all Control Methods:
 Table [SSDT](id F003) - 3 Objects with 0 Devices 0 Methods 0 Regions
-ACPI Namespace successfully loaded at root 803ac6e0
-evxfevnt-0094 [03] acpi_enable   : Transition to ACPI mode successful
+ACPI Namespace successfully loaded at root 803ad260
+evxfevnt-0096 [03] acpi_enable   : Transition to ACPI mode successful
 Using local APIC timer interrupts.
 Detected 12.528 MHz APIC timer.
 testing NMI watchdog ... OK.
 NET: Registered protocol family 16
+ACPI: bus type pci registered
 PCI: Using configuration type 1
-mtrr: v2.0 (20020519)
-ACPI: Subsystem revision 20050309
-evgpeblk-0979 [06] ev_create_gpe_block   : GPE 00 to 0F [_GPE] 2 regs on int 
0xA
-evgpeblk-0987 [06] ev_create_gpe_block   : Found 7 Wake, Enabled 0 Runtime 
GPEs in this block
+ACPI: Subsystem revision 20050408
+evgpeblk-1016 [06] ev_create_gpe_block   : GPE 00 to 0F [_GPE] 2 regs on int 
0xA
+evgpeblk-1024 [06] ev_create_gpe_block   : Found 7 Wake, Enabled 0 Runtime 
GPEs in this block
 Completing Region/Field/Buffer/Package 
initialization:...
 Initialized 16/16 Regions 0/0 Fields 18/18 Buffers 17/27 Packages (494 nodes)
-Executing all Device _STA and_INI methods:..[ACPI 
Debug] String: [0x24] " AC _STA"
+Executing all Device _STA and_INI methods:..[ACPI 
Debug]  String: [0x24] " AC _STA"
 ...
 49 Devices found containing: 49 _STA, 2 _INI methods
 ACPI: Interpreter enabled
 ACPI: Using IOAPIC for interrupt routing
-nsxfeval-0250 [06] acpi_evaluate_object  : Handle is NULL and Pathname is 
relative
-nsxfeval-0250 [06] acpi_evaluate_object  : Handle is NULL and Pathname is 
relative
-nsxfeval-0250 [06] acpi_evaluate_object  : Handle is NULL and Pathname is 
relative
-nsxfeval-0250 [06] acpi_evaluate_object  : Handle is NULL and Pathname is 
relative
+nsxfeval-0251 [06] acpi_evaluate_object  : Handle is NULL and Pathname is 
relative
+nsxfeval-0251 [06] acpi_evaluate_object  : Handle is NULL and Pathname is 
relative
+nsxfeval-0251 [06] acpi_evaluate_object  : 

Re: Thread_Id

2005-07-14 Thread Arjan van de Ven
On Thu, 2005-07-14 at 15:36 +0530, RVK wrote:

> >
> >it doesn't return a number it returns a pointer ;) or a floating point
> >number. You don't know :)
> >
> >what it returns is a *cookie*. A cookie that you can only use to pass
> >back to various pthread functions.
> >
> >  
> >
> Hahaha..common. Please clarify following

I'm missing the joke

> SYNOPSIS
>#include 
> 
>pthread_t pthread_self(void);
> 
> DESCRIPTION
>pthread_self return the thread identifier for the calling thread.

*identifier*.
It doesn't give a meaning beyond that, and if you look at other pthread
manpages (say pthread_join) it just wants that identifier back. If you
want to attach meaning to a thread identifier, please come up with a
manpage/standard that actually defines the meaning of it.

> 
> bits/pthreadtypes.h:150:typedef unsigned long int pthread_t;

and here you 
1) look at implementation details of your specific threading
implementation and 
2) you prove that your analysis is wrong since the implementation you
look at defines it as *unsigned* so it can't be negative. So what your
app does is clearly wrong even within the implementation you look at.


Other implementations are allowed to use different types for this. In
fact, I'd be surprised if NPTL and LinuxThreads would have the same
type... (they'll have the same size for ABI compat reasons of course,
but type... not so sure).



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Open source firewalls

2005-07-14 Thread RVK
I don't think buffer overflow has anything to do with transparent proxy. 
Transparent proxying is just doing some protocol filtering. Still the 
proxy code may have some buffer overflows. The best way is first to try 
avoiding any buffer overflows and take programming precautions. Other 
way is to chroot the services, if running it on a firewall. There are 
various mechanisms which can be used like bounding the memory region it 
self. Stack Randomisation and Canary based approaches can also avoid any 
buffer overflow attacks.
IDS runs on L7, best example is snort. Its not possible for IDS to 
detect these attacks accurately.


rvk

Helge Hafting wrote:


Vinay Venkataraghavan wrote:


I know how to implement buffer overflow attacks. But
how would an intrusion detection system detect a
buffer overflow attack.


Buffer overflow attacks vary, but have one thing in common.  The
overflow string is much longer than what's usual for the app/protocol in
question.  It may also contain illegal characters, but be careful -
non-english users use plenty of valid non-ascii characters in filenames,
passwords and so on.

The way to do this is to implement a transparent proxy module for every
protocol you want to do overflow prevention for.  Collect the strings
transmitted, pass them on after validating them.  Or reset the
connection when one gets "too long".  For example, you may want to
limit POP usernames to whatever the maximum username length is
on your system.  But make such things configurable, others may
want longer usernames than you.


My question is at the layer
that the intrusion detection system operates, how will
it know that a particular string for exmaple is liable
to overflow a vulnerable buffer.




It can't know of course, but it can suspect that 1000-character
usernames, passwords or filenames is foul play and reset the
connection.  Or 10k URL's . . .

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Krzysztof Halasa
Linus Torvalds <[EMAIL PROTECTED]> writes:

> And in short-term things, the timeval/jiffie conversion is likely to be a 
> _bigger_ issue than the crystal frequency conversion.
>
> So we should aim for a HZ value that makes it easy to convert to and from
> the standard user-space interface formats. 100Hz, 250Hz and 1000Hz are all
> good values for that reason. 864 is not.

Probably only theoretical, and probably the hardware isn't up to it...
But what if we have:
- 64-bit jiffies done in hardware (a counter). 1 cycle = 1 microsecond
  or even a CPU clock cycle. Can *APIC or another HPET do that?
- 64-bit "match timer" (i.e., a register in the counter which fires IRQ
  when it matches the counter value)
- the CPU(s) sorting the timer list and programming "match timer" with
  software timer next to be executed. Upon firing the timer, a new "next
  to be executed" timer would be programmed into the counter's "match
  timer".

We would have no timer ticks when nobody requested them - the CPUs would
be allowed to sleep for, say, even 50 ms when no task is RUNNING.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console remains blanked

2005-07-14 Thread Albert Herranz
Hi,

 --- Jan Engelhardt <[EMAIL PROTECTED]>
escribió:
> The console is unblanked when you hit a key (or
> probably move a mouse too), 
> not when some application outputs something on
> stdout/stderr/etc.

Before 2.6.12-rc2, the console was unblanked by just
writing to the console.
For keyboardless and mouseless systems (which is my
case, embedded) this new behaviour is a bit annoying.

> Which kernel versions have this patch? I'm on
> 2.6.13-rc1 and have no problems 
> with unblanking.

I have this problem since 2.6.12-rc2.
If I add back the patch hunk specified in my original
message, the blanking behaviour changes to that
present in pre-2.6.12-rc2 kernels.

I just would like to know if this new behaviour is
just intentional and makes sense to everyone (except
me :-)

Thanks for your feedback,
Albert




__ 
Renovamos el Correo Yahoo! 
Nuevos servicios, más seguridad 
http://correo.yahoo.es
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Thread_Id

2005-07-14 Thread RVK

Arjan van de Ven wrote:


On Thu, 2005-07-14 at 11:03 +0530, RVK wrote:
 


Robert Hancock wrote:

   


RVK wrote:

 


Can anyone suggest me how to get the threadId using 2.6.x kernels.
pthread_self() does not work and returns some -ve integer.
   


What do you mean, negative integer? It's not an integer, it's a
pthread_t, you're not even supposed to look at it..
 


What is pthread_t inturn defined to ? pthread_self for 2.4.x thread
libraries return +ve number(as u have a objection me calling it as
integer :-))
   



it doesn't return a number it returns a pointer ;) or a floating point
number. You don't know :)

what it returns is a *cookie*. A cookie that you can only use to pass
back to various pthread functions.

 


Hahaha..common. Please clarify following
SYNOPSIS
  #include 

  pthread_t pthread_self(void);

DESCRIPTION
  pthread_self return the thread identifier for the calling thread.

bits/pthreadtypes.h:150:typedef unsigned long int pthread_t;

rvk


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
.

 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Open source firewalls

2005-07-14 Thread Helge Hafting

Vinay Venkataraghavan wrote:


I know how to implement buffer overflow attacks. But
how would an intrusion detection system detect a
buffer overflow attack. 

Buffer overflow attacks vary, but have one thing in common.  The 
overflow string is much longer than what's usual for the app/protocol in 
question.  It may also contain illegal characters, but be careful - 
non-english users use plenty of valid non-ascii characters in filenames,

passwords and so on.

The way to do this is to implement a transparent proxy module for every 
protocol you want to do overflow prevention for.  Collect the strings
transmitted, pass them on after validating them.  Or reset the 
connection when one gets "too long".  For example, you may want to

limit POP usernames to whatever the maximum username length is
on your system.  But make such things configurable, others may
want longer usernames than you.


My question is at the layer
that the intrusion detection system operates, how will
it know that a particular string for exmaple is liable
to overflow a vulnerable buffer. 

 


It can't know of course, but it can suspect that 1000-character
usernames, passwords or filenames is foul play and reset the
connection.  Or 10k URL's . . .

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Maciej W. Rozycki
On Wed, 13 Jul 2005, Benjamin LaHaise wrote:

> That's one thing I truely dislike about the current timer code.  If we 
> could program the RTC interrupt to come into the system as an NMI (iirc 
> oprofile already has code to do this), we could get much better TSC 
> interpolation since we would be sampling the TSC at a much smaller, less 
> variable offset, which can only be a good thing.

 And we'd get a lot more crashes on broken systems that do not handle NMIs 
in the SMM -- this is the very reason the NMI watchdog is disabled these 
days by default.  A whole lot of systems simply cannot handle NMIs 
happening randomly.

 Programming an I/O APIC to deliver the RTC interrupt (or any other that's 
edge-triggered) as an NMI is itself trivial (we can do this for the PIT 
for the NMI watchdog already).

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/19] Kconfig I18N completion

2005-07-14 Thread Jan Engelhardt
>> Patch 19/19 contains a .po file.
>
>Yes, the patch 19/19 contains the translation of configuration...
>I see Linus doesn't want the huge language files in kernel source.
>But what is Linus opinion about this little .po file?

What is little? Given that there's 'roughly' 119 languages (find 
/usr/share/locale -type d -maxdepth 1 | wc -l), you'd surely reconsider if 
adding 119 23KB files, if it was considered "small".

As I perceive it, the policy is: no PO files in mainline at all. I'm fine with 
that.

Keeping the translations in sync with the mainline Kconfig help texts/etc. is 
also not an easy task unless you got a lot of time to spare.


Jan Engelhardt
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fuse chardevice number

2005-07-14 Thread Miklos Szeredi
> >>  /** The minor number of the fuse character device */
> >> -#define FUSE_MINOR 229
> >> +#define FUSE_MINOR MISC_DYNAMIC_MINOR
> >
> >FUSE has an allocated fix minor.  Dynamic minor is much harder to
> >handle with legacy /dev (not udev).
> 
> How many users of 2.6.13 and up really do not have/run udev? [Please don't 
> send too many responses]

Don't be afraid, 2.6.13 is not yet released.  So the number of users
of udev under 2.6.13 is exactly zero ;)

> A module option could be added to specify an explicit minor.

That's just making it more complicated without any gain. An assigned
device number (if it exsist) is exactly as good as a dynamic.

Miklos

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc2-mm2

2005-07-14 Thread David Vrabel
Chuck Ebbert wrote:
>Looks like Quilt is adding the space during push/pop operations.  Only the
> lines it has touched in the series file have the trailing space.

Quilt versions prior to 0.39 would add a trailing space to the series
file entry when doing a quilt refresh with the default -p1 patch level.

David Vrabel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Maciej W. Rozycki
On Wed, 13 Jul 2005, Lee Revell wrote:

> Did anyone else find this strange:
> 
> "The RTC is used in periodic mode to provide the system profiling
> interrupt on uni-processor systems and the clock interrupt on
> multi-processor systems."
> 
> We just take NR_CPUS * HZ timer interrupts per second, what's the
> advantage of using the RTC?

 It tends to work in the APIC mode all the time (with all systems), unlike 
the PIT which has "interesting" routing problems with its IRQ0, which 
you've probably already noticed.  Have a look at all the hassle in 
check_timer() if you want to double-check it.

 Of course using APIC internal timers is generally the best idea on SMP, 
but they may have had reasons to avoid them (it's not an ISA interrupt, so 
it could have been simply out of question in the initial design).

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fuse chardevice number

2005-07-14 Thread Jan Engelhardt
Hi,

>>  /** The minor number of the fuse character device */
>> -#define FUSE_MINOR 229
>> +#define FUSE_MINOR MISC_DYNAMIC_MINOR
>
>FUSE has an allocated fix minor.  Dynamic minor is much harder to
>handle with legacy /dev (not udev).

How many users of 2.6.13 and up really do not have/run udev? [Please don't 
send too many responses]

A module option could be added to specify an explicit minor.


Jan Engelhardt
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc2-mm2

2005-07-14 Thread Johannes Stezenbach
On Wed, Jul 13, 2005 at 05:29:32PM -0400, Chuck Ebbert wrote:
> On Wed, 13 Jul 2005 at 00:23:42 -0700, Andrew Morton wrote:
> 
> >>...and BTW why does every line in the series file have a trailing space?
> >
> > Not in my copy of
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.13-rc2/2.6.13-rc2-mm2/patch-series
> > ?
> 
> 
>   Looks like Quilt is adding the space during push/pop operations.  Only the
> lines it has touched in the series file have the trailing space.

Nope. For me quilt leaves a trailing space if I add patches with -p0
to the series file and then do a "quilt refresh -p1".

Johannes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Oops when running mkreiserfs on large (9TB) raid0 set on AMD64 SMP

2005-07-14 Thread Paul Slootman
On Thu 14 Jul 2005, Neil Brown wrote:
> > Aug  9 20:09:18 localhost kernel: 
> > {:raid0:raid0_make_request+472}
> 
> Looks like the problem is at:
>   sector_div(x, (unsigned long)conf->hash_spacing);
>   zone = conf->hash_table[x];
[...]
> Anyway, the following patch, if it compiles, might changed the
> behaviour of raid0 -- possibly even improve it :-)
> 
> Thanks for the report.
> 
> Success/failure reports of this patch would be most welcome.

Thanks for the quick fix. I just tried it again with your patch,
and now it works fine!

FilesystemSize  Used Avail Use% Mounted on
/dev/md11 8.8T   33M  8.8T   1% /mnt

Very nice... :)


Paul Slootman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: console remains blanked

2005-07-14 Thread Jan Engelhardt

>Looks like, since [1] was merged, a blanked console
>(due to inactivity for example) doesn't get unblanked
>anymore when new output is written to it.

The console is unblanked when you hit a key (or probably move a mouse too), 
not when some application outputs something on stdout/stderr/etc.

>[1]
>http://marc.theaimsgroup.com/?l=linux-kernel=111052009232499=2

Which kernel versions have this patch? I'm on 2.6.13-rc1 and have no problems 
with unblanking.



Jan Engelhardt
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt

2005-07-14 Thread Jan Engelhardt
 "My expectation is if we want to beat the competition, we'll want
 the ability to go *under* 100Hz."
>>> 
>>> What does Windows do here?
>>
>> windows xp base rate is 100Hz... but multimedia apps can ask for almost 
>
> 83Hz

Well, Windoes 98 (vmmon) shows very different ones:

/dev/vmmon[4355]: host clock rate change request 0 -> 19
/dev/vmmon[4355]: host clock rate change request 19 -> 0
/dev/vmmon[4355]: host clock rate change request 0 -> 19
/dev/vmmon[4355]: host clock rate change request 19 -> 63
/dev/vmmon[4355]: host clock rate change request 63 -> 200
/dev/vmmon[4355]: host clock rate change request 200 -> 201
/dev/vmmon[4355]: host clock rate change request 201 -> 1001

>> any rate they want (depends on the hw capabilities).  i recall seeing
>> rates >1200Hz when you launch some of the media player apps -- sorry i
>> forget the exact number.

I have seen some apps which seem to schedule themselves using some kind of
SCHED_FIFO and therefore seem to get good RT:

from an ini file...
  # This option determines the multi-tasking capabilities of WinDEU.
  # The priority determines the minimum number of milliseconds WinDEU
  # will work before giving control back to Windows.
  # For example, if you set it to 20, it means WinDEU will gives
  # back control to Windows approximately (at most) 50 times a second.
  # A value of 0 means WinDEU WON'T multi-task.
  # (Can be changed in the preferences dialog box.)
  BuildPriority=25



Jan Engelhardt
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] don't bind to PCI express links [8/9]

2005-07-14 Thread Adam Belay
This patch prevents the PCI<->PCI bridge driver from binding to PCI
express devices.  This is needed to coexist with the PCI express root
port driver.  Eventually we may want to rework and better integrate
linux PCI express link support, but for now this should work.

Signed-off-by: Adam Belay <[EMAIL PROTECTED]>

--- a/drivers/pci/bus/pci-bridge.c  2005-07-14 02:30:09.0 -0400
+++ b/drivers/pci/bus/pci-bridge.c  2005-07-14 02:46:12.0 -0400
@@ -132,6 +132,10 @@
if (dev->subordinate)
return -ENODEV;
 
+   /* don't bind to pci express links */
+   if (pci_find_capability(dev, PCI_CAP_ID_EXP))
+   return -ENODEV;
+
bus = ppb_detect_bus(dev);
if (!bus)
return -ENODEV;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] master abort on scanning fixes [6/9]

2005-07-14 Thread Adam Belay
The PCI bridge driver now checks if changing bridge_ctrl is necessary.
It also restores the original bridge_ctl settings when finished scanning
for devices.  Finally, a pci_bus setup fix is included.

Signed-off-by: Adam Belay <[EMAIL PROTECTED]>

--- a/drivers/pci/bus/pci-bridge.c  2005-07-12 01:45:46.0 -0400
+++ b/drivers/pci/bus/pci-bridge.c  2005-07-14 02:09:15.0 -0400
@@ -30,7 +30,7 @@
bus->bridge = >dev;
bus->ops = bus->parent->ops;
bus->sysdata = bus->parent->sysdata;
-   bus->bridge = get_device(>dev);
+   bus->self = dev;
 
/* Set up default resource pointers and names.. */
for (i = 0; i < 4; i++) {
@@ -82,12 +82,7 @@
if (!bus)
return NULL;
 
-   /* Disable MasterAbortMode during probing to avoid reporting
-* of bus errors (in some architectures)
-*/ 
pci_read_config_word(dev, PCI_BRIDGE_CONTROL, );
-   pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
- bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
 
bus->number = bus->secondary = busnr;
bus->primary = buses & 0xFF;
@@ -105,10 +100,22 @@
 {
unsigned int devfn;
 
+   /* Disable MasterAbortMode during probing to avoid reporting
+* of bus errors (in some architectures)
+*/ 
+   if (!(bus->bridge_ctl & PCI_BRIDGE_CTL_MASTER_ABORT))
+   pci_write_config_word(bus->self, PCI_BRIDGE_CONTROL,
+   bus->bridge_ctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
+
/* Go find them, Rover! */
for (devfn = 0; devfn < 0x100; devfn += 8)
pci_scan_slot(bus, devfn);
 
+   /* restore the original bridge_ctl configuration */
+   if (!(bus->bridge_ctl & PCI_BRIDGE_CTL_MASTER_ABORT))
+   pci_write_config_word(bus->self, PCI_BRIDGE_CONTROL,
+ bus->bridge_ctl);
+
pcibios_fixup_bus(bus);
pci_bus_add_devices(bus);
 }


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   >