date:20070426

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread Christoph Lameter

On Thu, 26 Apr 2007, Andrew Morton wrote:

> > Or make sure that truncate
> > doesn't race on a partial *block* truncate?
> 
> lock four pages

You would only lock a single higher order block. Truncate works on that 
level.

If you have 4 separate pages then you need to take separate locks and you 
may not have contiguous memory which makes the filesystem run through all 
sorts of hoops.

> I'm not saying it's especially simple, nor fast.  But it has the advantage
> that we're not forced to use larger pages with _it's_ attendant performance
> problems.

The patch is not about forcing to use large pages but about the option to 
use larger pages. Its a new flexibility.

> And it doesn't introduce a rather nasty hack of pretending (in some places)
> that pages are larger than they really are.

They are really larger. One page struct controls it all.

> And it has the very significant advantage that it doesn't introduce brand
> new concepts and some complexity into core MM.

The patchset would reduce complexity and making it easy to handle the page 
cache. Gets rid of the hacks to support larger ones right now. Its 
straightforward, no new locking, very much a cleanup patch.

> And make no mistake: the latter disadvantage is huge.  Because if we do the
> PAGE_CACHE_SIZE hack (sorry, but it _is_), we have to do it *for ever*. 
> Maintaining and enhancing core MM and VFS becomes harder and more costly
> and slower and more buggy *for ever*.  The ramp for people to become
> competent on core MM becomes longer.  Our developer pool becomes smaller, and
> proportionally less skilled.

No it becomes easier. Look at the patchset. It cleans up a huge mess.
What is hacky about it? It is consistently using larger pages for the page 
cache and it integrates nicely into the VM.

> And hardware gets better.  If Intel & AMD come out with a 16k pagesize
> option in a couple of years we'll look pretty dumb.  If the problems which
> you're presently having with that controller get sorted out in the next
> generation of the hardware, we'll also look pretty dumb.

We are currently looking dumb and unable to deal with the hardware. Yes 
we can pressure the hardware vendors to produce hardware conforming to our 
specifications but I always thought that was how another company operates.

> As always, there are tradeoffs.  We can see the cons, and they are very
> significant.  We don't yet know the pros.  Perhaps they will be similarly
> significant.  But I don't believe that the larger PAGE_CACHE_SIZE hack
> (sorry) is the only way in which they can be realised.

It is the most consistent solution that avoid the proliferation of further 
hacks to address the large blocksize.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work

2007-04-26 Thread Jarek Poplawski

On Thu, Apr 26, 2007 at 08:44:36PM +0400, Oleg Nesterov wrote:
> On 04/26, Jarek Poplawski wrote:
> >
> > On Wed, Apr 25, 2007 at 04:47:14PM +0400, Oleg Nesterov wrote:
> > ...
> > > > > > +   spin_lock_irq(>lock);
> > > > > > +   /* CPU_DEAD in progress may change cwq */
> > > > > > +   if (likely(cwq == get_wq_data(work))) {
> > > > > > +   list_del_init(>entry);
> > > > > > +   __set_bit(WORK_STRUCT_PENDING, 
> > > > > > work_data_bits(work));
> > > > > > +   retry = try_to_del_timer_sync(>timer) < 
> > > > > > 0;
> > > > > > +   }
> > > > > > +   spin_unlock_irq(>lock);
> > > > > > +   } while (unlikely(retry));
> > > 
> > > >  1. If delayed_work_timer_fn of this work is fired and is waiting
> > > >  on the above spin_lock then, after above spin_unlock, the work
> > > >  will be queued.
> > > 
> > > No, in that case try_to_del_timer_sync() returns -1.
> > 
> > Yes. But I think it's safe only after moving work_clear_pending
> > in run_workqueue under a lock; probably otherwise there is a
> > possibility this flag could be cleared, after above unlock.  
> 
> It doesn't matter in this particular case because we are going to retry
> anyway. But yes, this patch moves work_clear_pending() under lock, because
> otherwise it could be cleared by run_workqueue() if this work is about
> to be executed, but was already deleted from list.

...and it seems to be the same what I meant...
I wanted only to make agree (now it's only for historical reasons)
the lock on _PENDING could matter in run_workqueue.

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21 reiserfs -- cicular locking?

2007-04-26 Thread Andrew Morton

On Thu, 26 Apr 2007 16:40:14 -0700 David Brownell <[EMAIL PROTECTED]> wrote:

> This might be a Heisenberg, but I figure it's worth posting
> in case anyone else sees similar oddness.  Never seen it
> before or since.  It's as if a gremlin got annoyed with me
> for switching a filesystem from reiser to ext3.  :)
> 
> - Dave
> 
> 
> ===
> [ INFO: possible circular locking dependency detected ]
> 2.6.21-git #6
> ---
> vi/4556 is trying to acquire lock:
>  (_SB(s)->xattr_dir_sem){..--}, at: [] 
> reiserfs_chown_xattrs+0x5b/0x128
> 
> but task is already holding lock:
>  (>i_mutex){--..}, at: [] chown_common+0x93/0xb3
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (>i_mutex){--..}:
>[] __lock_acquire+0x9f7/0xbaa
>[] get_xa_root+0x49/0x107
>[] lock_acquire+0x7b/0x9f
>[] get_xa_root+0x49/0x107
>[] save_trace+0x40/0x9e
>[] __mutex_lock_slowpath+0xd8/0x281
>[] _spin_unlock_irq+0x24/0x4a
>[] get_xa_root+0x49/0x107
>[] open_xa_dir+0x1c/0xf8
>[] __down_read+0x34/0x9d
>[] reiserfs_delete_xattrs+0x64/0x185
>[] _atomic_dec_and_lock+0x14/0x34
>[] reiserfs_delete_inode+0x38/0xae
>[] generic_delete_inode+0x64/0xf5
>[] reiserfs_delete_inode+0x0/0xae
>[] generic_delete_inode+0x7a/0xf5
>[] do_unlinkat+0xd9/0x14f
>[] trace_hardirqs_on+0x123/0x14d
>[] trace_hardirqs_on_thunk+0x35/0x37
>[] system_call+0x7e/0x83
>[] 0x
> 
> -> #0 (_SB(s)->xattr_dir_sem){..--}:
>[] print_circular_bug_header+0xcc/0xd3
>[] __lock_acquire+0x8f3/0xbaa
>[] reiserfs_chown_xattrs+0x5b/0x128
>[] lock_acquire+0x7b/0x9f
>[] reiserfs_chown_xattrs+0x5b/0x128
>[] down_read+0x32/0x3b
>[] reiserfs_chown_xattrs+0x5b/0x128
>[] __capable+0x9/0x1d
>[] reiserfs_setattr+0x11e/0x1ec
>[] current_fs_time+0x35/0x3a
>[] notify_change+0x122/0x231
>[] chown_common+0x9e/0xb3
>[] fget+0x88/0xa7
>[] sys_fchown+0x30/0x47
>[] system_call+0x7e/0x83
>[] 0x
> 
> other info that might help us debug this:
> 
> 1 lock held by vi/4556:
>  #0:  (>i_mutex){--..}, at: [] chown_common+0x93/0xb3
> 
> stack backtrace:
> 
> Call Trace:
>  [] print_circular_bug_tail+0x69/0x72
>  [] print_circular_bug_header+0xcc/0xd3
>  [] __lock_acquire+0x8f3/0xbaa
>  [] reiserfs_chown_xattrs+0x5b/0x128
>  [] lock_acquire+0x7b/0x9f
>  [] reiserfs_chown_xattrs+0x5b/0x128
>  [] down_read+0x32/0x3b
>  [] reiserfs_chown_xattrs+0x5b/0x128
>  [] __capable+0x9/0x1d
>  [] reiserfs_setattr+0x11e/0x1ec
>  [] current_fs_time+0x35/0x3a
>  [] notify_change+0x122/0x231
>  [] chown_common+0x9e/0xb3
>  [] fget+0x88/0xa7
>  [] sys_fchown+0x30/0x47
>  [] system_call+0x7e/0x83
> 

cc added.  This was also reported againt -rc7-mm1 (or 2)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] change global zonelist order v4 [0/2]

2007-04-26 Thread KAMEZAWA Hiroyuki

Hi, this is version 4. including Lee Schermerhon's good rework.
and automatic configuration at boot time.

(This patch is reworked from V2, so skip V3 changelog.)

ChangeLog V2 -> V4
- automatic configuration is added.
- automatic configuration is now default.
- relaxed_zone_order is renamed to be numa_zonelist_order
  you can specify value "default" , "zone" , "numa"
- clean-up from Lee Schermerhorn
- patch is speareted to "base" and "autoconfiguration algorithm"

Changelog from V1 -> V2
- sysctl name is changed to be relaxed_zone_order
- NORMAL->NORMAL->->DMA->DMA->DMA order (new ordering) is now default.
  NORMAL->DMA->NORMAL->DMA order (old ordering) is optional.
- addes boot opttion to set relaxed_zone_order. ia64 is supported now.
- Added documentation


Please don't hesitate to rework this if you have good plan.
I'll be offlined in the next week because my office will be closed.
Lee-san, please Ack or Sign-Off if patches seems O.K.

I think my autoconfiguration logic is reasonable to some extent. But we may
have some discussion. It can be rewritable by additional patch easily.

Thanks.
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: menuconfig issue (checklist) in 2.6.20.7 & 2.6.21-rc7 ?

2007-04-26 Thread Jan Engelhardt


On Apr 26 2007 05:06, Mike Galbraith wrote:
>On Wed, 2007-04-25 at 22:30 +0200, Sam Ravnborg wrote:
>
>> > There are general funnies in the menuconfig world (my preference) here.
>> > For instance, I recently had reason to change/test different default IO
>> > schedulers, and found that no matter what I did, I couldn't select a
>> > default IO scheduler any more, though I used to be able to do so.
>
>> Tried it now with latest -git from Linus and here it works.
>> Notice that you need to make the scheduler a built-in <*>
>> before you can select it as default.
>> A scheduler selected as a module  cannot be made default.
>
>Ok, I guess my ncurses is ill.  (all built in)  Thanks.

CONFIG_MODULES=n?


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Back to the future.

2007-04-26 Thread Pekka Enberg


On 4/27/07, Pavel Machek <[EMAIL PROTECTED]> wrote:

Now, it would be _very_ nice to be able to snapshot system and
continue running, but I just don't see how to do it without extensive
filesystem support.


So what kind of support do we need from the filesystem?

 Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: Pin an IRQ to a particular CPU

2007-04-26 Thread David Schwartz


> Hello,
> 
> In an effort to increase over all throughput of my Linux NFS file
> server, I thought about trying to force an IRQ, for the NIC, to be
> serviced by a particular CPU.  Is this possible?
> 
> TIA,
> Phy

/proc/irq/*/smp_affinity

I would recommend automatic balancing and leave it at that. This code 
automatically distributes IRQs to CPUs such that each IRQ can only be serviced 
by a single CPU and the IRQs are distributed over the CPU set.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: MAINTAINERS file out of date?

2007-04-26 Thread H. Peter Anvin

Eric W. Biederman wrote:
> "H. Peter Anvin" <[EMAIL PROTECTED]> writes:
> 
>> A lot of that code (although, of course, not all) could be written in C,
>> though.  I'm thinking of taking a stab at rewriting it that way.
> 
> Is this using the .code16gcc?  Or are you thinking of some other
> technique.  Requiring another C compiler to build the kernel would
> be a pain to use.

.code16gcc was what I was using.  There is a GSoC project that I'm
mentoring to get 16-bit support for gcc, that will be possible to
eventually migrate to (for code size) if/when it gets implemented and
gets pushed out far enough, but that's for the future.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: MAINTAINERS file out of date?

2007-04-26 Thread Eric W. Biederman

"H. Peter Anvin" <[EMAIL PROTECTED]> writes:

> A lot of that code (although, of course, not all) could be written in C,
> though.  I'm thinking of taking a stab at rewriting it that way.

Is this using the .code16gcc?  Or are you thinking of some other
technique.  Requiring another C compiler to build the kernel would
be a pain to use.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Pin an IRQ to a particular CPU

2007-04-26 Thread Jan Engelhardt

On Apr 26 2007 18:15, Phy Prabab wrote:
> Hello,
>
> In an effort to increase over all throughput of my Linux NFS file
> server, I thought about trying to force an IRQ, for the NIC, to be
> serviced by a particular CPU.  Is this possible?

Sorry, too early in the morning :)
It is definitely possible, since that is what irqbalance does.
(Not that I know the exact details right now...)
the other URL I posted might give some useful hints too, though.

Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Pin an IRQ to a particular CPU

2007-04-26 Thread Jan Engelhardt


> Hello,
>
> In an effort to increase over all throughput of my Linux NFS file
> server, I thought about trying to force an IRQ, for the NIC, to be
> serviced by a particular CPU.  Is this possible?

http://lkml.org/lkml/2007/4/5/155


Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ZFS with Linux: An Open Plea

2007-04-26 Thread Valerie Henson

On Wed, Apr 18, 2007 at 01:25:19PM -0400, Lennart Sorensen wrote:
> 
> Does it matter that google's recent report on disk failures indicated
> that SMART never predicted anything useful as far as they could tell?
> Certainly none of my drive failures ever had SMART make any kind of
> indication that anything was wrong.

I saw that talk, and that's not what I got out of it.  They found that
SMART error reports _did_ correlate with drive failure.  See page 8
of:

http://www.usenix.org/events/fast07/tech/full_papers/pinheiro/pinheiro.pdf

(If you're not a USENIX member, you may be able to find a free
download copy elsewhere.)

However, they found that the correlation was not strong enough to make
it economically feasible to replace disks reporting SMART failures,
since something like 70% of disks were still working a year after the
first failure report.  Also, they found that some disks failed without
any SMART error reports.

Now, Google keeps multiple copies (3 in GoogleFS, last I heard) of
data, so for them, "economically feasible" means something different
than for my personal laptop hard drive.  I have twice had my laptop
hard drive start spitting SMART errors and then die within a week.  It
is economically quite sensible for me to replace my laptop drive once
it has an error, since I don't carry around 3 laptops everywhere I go.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread Jens Axboe

On Thu, Apr 26 2007, Mel Gorman wrote:
> On (26/04/07 20:39), Jens Axboe didst pronounce:
> > On Thu, Apr 26 2007, Christoph Lameter wrote:
> > > On Thu, 26 Apr 2007, Jens Axboe wrote:
> > > 
> > > > On Thu, Apr 26 2007, Christoph Lameter wrote:
> > > > > On Thu, 26 Apr 2007, Jens Axboe wrote:
> > > > > 
> > > > > > The above can be implemented fairly cleanly, and on a need-to-have
> > > > > > basis. It's not something that'll break drivers.
> > > > > 
> > > > > But its also not going to fix the hacks that we have in the kernel 
> > > > > to deal with > PAGE_SIZE i/o.
> > > > 
> > > > No, but that's a _seperate_ issue! Don't keep mixing up the two.
> > > 
> > > Yes I understand that you want it to be a separate issue so we get get 
> > > more rationales for the hacks that we do to avoid the large 
> > > order allocations.
> > 
> > Christoph, don't take your frustrations out on me. I've several times in
> > this thread said that I'd LIKE to have > PAGE_SIZE support in the page
> > cache. I WROTE the initial pktcdvd driver that is a primary example of
> > these hacks, I'm very well aware of the pain and bugs involved with
> > that.
> > 
> > But don't push large pages as the only solution to larger ios, because
> > that is trivially not true.
> > 
> 
> Would it be fair to say that your approach and using large pages are not
> mutually exclusive solutions? It seems a lot of the debate here is
> assuming there is One And Only One Solution for larger ios.

Definitely, there's zero reason they cannot coexist.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work

2007-04-26 Thread Jarek Poplawski

On Thu, Apr 26, 2007 at 08:34:06PM +0400, Oleg Nesterov wrote:
> On 04/26, Jarek Poplawski wrote:
> >
> > >   void cancel_rearming_delayed_work(struct delayed_work *dwork)
> > >   {
> > >   struct work_struct *work = >work;
> > >   struct cpu_workqueue_struct *cwq = get_wq_data(work);
> > >   int done;
> > 
> > I don't understand, why you think cwq cannot be NULL here.
> 
> sure it can, this is just a template.
> 
> > > 
> > >   do {
> > >   done = 1;
> > >   spin_lock_irq(>lock);
> > > 
> > >   if (!list_empty(>entry))
> > >   list_del_init(>entry);
> > 
> > BTW, isn't needs_a_good_name needles after this and after del_timer 
> > positive?
> 
> no, we still need it. work->func() may be running on another CPU as well.
> 
> > 
> > >   else if (test_and_set_bit(WORK_STRUCT_PENDING, 
> > > work_data_bits(work)))
> > >   done = del_timer(>timer)
> > 
> > If this runs while a work function is fired in run_workqueue,
> > it sets _PENDING bit, but if the work skips rearming, we have probably
> > endless loop, again.
> 
> No, if the work skips rearming (or didn't yet), we set WORK_STRUCT_PENDING
> successfully.

Sorry! Should be:
"If this runs while a work function is fired in run_workqueue,
it sets _PENDING bit, but if the work skips rearming, I have probably
endless loop, again."

> 
> >  It is something alike to the current
> > way, with some added measures: you try to shoot a work on the run,
> > while queued or timer_pending, plus the _PENDING flag set, so it seems,
> > there is some risk of longer than planed looping.
> 
> Sorry, can't understand. done == 0 means that the queueing in progress,
> this work should be placed on cwq->worklist very soon, most probably
> right after we drop cwq->lock.

I think, theoretically, probably, maybe, there is possible some strange
case, this function gets spin_lock only when: list_empty(>entry) == 1
&& _PENDING == 1 && del_timer(>timer) == 0.

> 
> > I have to look at this more, at home and, if something new, I'll write
> > tomorrow. So, the good news, is you should have enough sleep this time!
> 
> Thanks for review!

OK. Here is the review:

It looks great!!! I cannot believe, it could be so "easy"!

Regards,
Jarek P.

PS: probably unusable, but for my own satisfaction:

Acked-by: Jarek Poplawski <[EMAIL PROTECTED] 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread Andrew Morton

On Fri, 27 Apr 2007 14:20:46 +1000 David Chinner <[EMAIL PROTECTED]> wrote:

> >blocksizes via this scheme - instantiate and lock four pages and go for
> >it.
> 
> So now how do you get block aligned writeback?

in writeback and pageout:

if (page->index & mapping->block_size_mask)
continue;

> Or make sure that truncate
> doesn't race on a partial *block* truncate?

lock four pages

> You basically have to
> jump through nasty, nasty hoops, to handle corner cases that are introduced
> because the generic code can no longer reliably lock out access to a
> filesystem block.
> 
> Eventually you end up with something like fs/xfs/linux-2.6/xfs_buf.c and
> doing everything inside the filesystem because it's the only way sane
> way to serialise access to these aggregated structures. This is
> the way XFS used to work in it's data path, and we all know how long
> and loud people complained about that.
> 
> A filesystem specific aggregation mechanism is not a palatable solution
> here because it drives filesystems away from being able to use generic
> code. 

I would expect we could (should) implement this in generic code by
modifying the existing stuff.

I'm not saying it's especially simple, nor fast.  But it has the advantage
that we're not forced to use larger pages with _it's_ attendant performance
problems.

And it will benefit all filesystems immediately.

And it doesn't introduce a rather nasty hack of pretending (in some places)
that pages are larger than they really are.

And it has the very significant advantage that it doesn't introduce brand
new concepts and some complexity into core MM.

And make no mistake: the latter disadvantage is huge.  Because if we do the
PAGE_CACHE_SIZE hack (sorry, but it _is_), we have to do it *for ever*. 
Maintaining and enhancing core MM and VFS becomes harder and more costly
and slower and more buggy *for ever*.  The ramp for people to become
competent on core MM becomes longer.  Our developer pool becomes smaller, and
proportionally less skilled.

And hardware gets better.  If Intel & AMD come out with a 16k pagesize
option in a couple of years we'll look pretty dumb.  If the problems which
you're presently having with that controller get sorted out in the next
generation of the hardware, we'll also look pretty dumb.

As always, there are tradeoffs.  We can see the cons, and they are very
significant.  We don't yet know the pros.  Perhaps they will be similarly
significant.  But I don't believe that the larger PAGE_CACHE_SIZE hack
(sorry) is the only way in which they can be realised.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: MAINTAINERS file out of date?

2007-04-26 Thread H. Peter Anvin

Eric W. Biederman wrote:
> WANG Cong <[EMAIL PROTECTED]> writes:
> 
>> Thanks for your point. 
>> I know little about virtualization, maybe can't help much. But I am 
>> interested
>> in other things you mentioned. AFAIK, segments can't be avoided on i386, and
>> Linux uses them very little, how are they recalculated constantly?
> 
> Look in arch/i386/boot/setup.S it runs in 16bit mode.  We are talking about
> real mode segments not 16bit segments.
> 
> 16bit real mode is a completely different ball game, and why we keep BIOS
> calls isolated to that one dinky file.
> 

A lot of that code (although, of course, not all) could be written in C,
though.  I'm thinking of taking a stab at rewriting it that way.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] include/linux/kprobes.h: always offer show_registers() prototype

2007-04-26 Thread Ananth N Mavinakayanahalli

On Fri, Apr 27, 2007 at 01:47:49AM +0200, Adrian Bunk wrote:
> Allow gcc to perform show_registers() type checking also with 
> CONFIG_KPROBES=n.

Is kprobes.h the correct place to allow for this change? Perhaps, with
Christoph's patch http://marc.info/?l=linux-kernel=117432009501114=2
consolidating the die_notifier code, this could be moved to
include/linux/kdebug.h instead?

Ananth

> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> ---
> 
> This patch has been sent on:
> - 27 Mar 2007
> 
>  include/linux/kprobes.h |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> --- linux-2.6.21-rc4-mm1/include/linux/kprobes.h.old  2007-03-26 
> 15:28:27.0 +0200
> +++ linux-2.6.21-rc4-mm1/include/linux/kprobes.h  2007-03-26 
> 15:32:23.0 +0200
> @@ -163,7 +163,6 @@
>  extern void arch_arm_kprobe(struct kprobe *p);
>  extern void arch_disarm_kprobe(struct kprobe *p);
>  extern int arch_init_kprobes(void);
> -extern void show_registers(struct pt_regs *regs);
>  extern kprobe_opcode_t *get_insn_slot(void);
>  extern void free_insn_slot(kprobe_opcode_t *slot, int dirty);
>  extern void kprobes_inc_nmissed_count(struct kprobe *p);
> @@ -241,4 +240,7 @@
>  {
>  }
>  #endif   /* CONFIG_KPROBES */
> +
> +void show_registers(struct pt_regs *regs);
> +
>  #endif   /* _LINUX_KPROBES_H */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-26 Thread Valerie Henson

On Thu, Apr 26, 2007 at 10:47:38AM +0200, Jan Kara wrote:
>   Do I get it right that you just have in each cnode a pointer to the
> previous & next cnode? But then if two consecutive cnodes get corrupted,
> you have no way to connect the chain, do you? If each cnode contained
> some unique identifier of the file and a number identifying position of
> cnode,  then there would be at least some way (through expensive) to
> link them together correctly...

You're right, it's easy to add a little more redundancy that would
make it possible to recover from two consecutive nodes being
corrupted.  Keeping a parent inode id in each continuation inode is
definitely a smart thing to do.

Some minor side notes: Continuation inodes aren't really in any
defined order - if you look at Jeff's ping-pong chunk allocation
example, you'll see that the data in each continuation inode won't be
in linearly increasing order.  Also, while the current implementation
is a simple doubly-linked list, this may not be the best solution
long-term.  What's important is that each continuation inode have a
back pointer to the parent and that there is some structure for
quickly looking up the continuation inode for a given file offset.
Suggestions for data structures that work well in this situation are
welcome. :)

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i386: For debugging, make the initial page table setup less forgiving.

2007-04-26 Thread Eric W. Biederman

Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:

>> I suspect what we want to do is come up with a function to call
>> to test to see if a page should be read-only and map such pages
>> _PAGE_KERNEL_RO, or _PAGE_KERNEL_RO_EXEC if it's code.
>>   
>
> Hm, I think that's a hard function to write in general. For the special
> case of pagetable_init it wouldn't be too hard, but it doesn't seem like
> a big improvement over the current state of affairs.

There is some difficulty there, and I need to look at the issue some more
but it just occurred to me that this problem of tracking special permissions
on pages is not confined to Xen.

Currently for pages we need to have a consistent global view if a pages
is uncached, write-combining or cached.  We need a way to check this so
that we don't get inconsistencies in how we are caching pages, when
we start controlling this on a per page basis.

Currently we are not doing this and it is blocking merging of PAT
write-combining support in the kernel because of the bad things
(silent data corruption and general cpu undefined behavior) that can
happen if we don't handling things consistently.

I don't know if we can solve the two problems with the same mechanism
but it is worth looking into.

Especially if what we do is just add support for marking some pages
as read-only.  We could use that ensuring there are no writable
mappings of the kernel's text segment for example.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-26 Thread Valerie Henson

On Thu, Apr 26, 2007 at 12:05:04PM -0400, Jeff Dike wrote:
> 
> No, I'm referring to a different file.  The scenario is that you have
> a growing file in a nearly full disk with files being deleted (and
> thus space being freed) such that allocations for the growing file
> bounce back and forth between chunks.

This is an excellent question.  I call this the ping-pong problem.
The solution is as Amit describes: You have a maximum of one
continuation inode per file per chunk, and you require sparse files.
Here's an example, spelled out:

Allocate file 1 in chunk A.
Grow file 1.
Chunk A fills up.
Allocate continuation inode for file 1 in chunk B.
Chunk A gets some free space.
Chunk B fills up.
Pick chunk A for allocating next block of file 1.
Try to look up a continuation inode for file 1 in chunk A.
Continuation inode for file 1 found in chunk A!
Attach newly allocated block to existing inode for file 1 in chunk A.

This is why the file format inside each chunk needs to support sparse
files.

I have a presentation that has a series of slides on problems and
potential resolutions that might help:

http://infohost.nmt.edu/~val/review/chunkfs_presentation.pdf

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code

2007-04-26 Thread GOTO Masanori

At Thu, 26 Apr 2007 20:30:54 -0400,
James Bottomley wrote:
> On Fri, 2007-04-27 at 02:13 +0200, Adrian Bunk wrote:
> > On Thu, Apr 26, 2007 at 07:59:57PM -0400, Robert P. J. Day wrote:
> > > On Fri, 27 Apr 2007, Adrian Bunk wrote:
> > > 
> > > > This patch removes kernel 2.4 code.
> > > >
> > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> > > >
> > > > ---
> > > >
> > > > This patch has been sent on:
> > > > - 26 Mar 2007
> > > >
> > > >  drivers/scsi/nsp32.c |  109 +--
> > > >  1 file changed, 13 insertions(+), 96 deletions(-)
> > > >
> > > > --- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old   2007-03-25 
> > > > 20:27:34.0 +0200
> > > > +++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c   2007-03-25 
> > > > 20:31:59.0 +0200
> > > > @@ -49,10 +49,6 @@
> > > >  #include 
> > > >  #include 
> > > >
> > > > -#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0))
> > > > -# include 
> > > > -#endif
> > > > -
> > > 
> > > i'm curious about the rules for removing code like this.  in the case
> > > of drivers, isn't it possible that some driver source could be
> > > relevant for both the 2.4 and 2.6 kernel source tree, and simply uses
> > > that kind of preprocessor check to make sure it's being compiled
> > > appropriately?
> > 
> > That's what it was for.
> > 
> > > or are you doing something more sophisticated than simply checking the
> > > kernel version being tested?
> > 
> > No.
> > 
> > The point is:
> > 
> > It seems this driver was once maintained for both 2.4 and 2.6 in one 
> > file.
> > 
> > As long as this is done, such version checks are OK.
> 
> Personally, I don't like to see 2.4 and 2.6 in a new driver, and will
> tend to try to force it to be 2.6  only.  For an existing driver, I tend
> to be much more tolerant: removing the huge gobs of code to achieve 2.6
> only is usually a bit disruptive on both the driver and the maintainer
> 
> > But if a driver is no longer actually maintained for both kernels these 
> > checks become useless (and there quickly arised unconditional 2.6-only 
> > code in such a driver) and can be removed.
> 
> This driver is maintained by 
> 
> Yokota Hiroshi <[EMAIL PROTECTED]>
> GOTO Masanori <[EMAIL PROTECTED]>
> 
> As it says in the header.  It was last modified in May 2006, so it is
> maintained under the somewhat elastic standards of SCSI.  I've cc'd them
> to see what they think.

Thanks forwarding the mail.

Personally I don't mind to remove 2.4 compatibility code.

Historically nsp32 was introduced in 2.4.20 and 2.5.40 so that we
needed to maintain both kernel serieses at that time.  However, these
days 2.4 kernel is not the mainline code, and 2.6 is the standard one.
We don't need to keep paying attention for 2.4 compatibility issue.
Moreover, some other compatibility code were already removed from 2.6
nsp32 driver by many cleaning up patches (thanks!).  This removal
makes sense to me.

Regards,
-- gotom
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Back to the future.

2007-04-26 Thread Pekka J Enberg

On Thu, 2007-04-26 at 09:56 -0700, Linus Torvalds wrote:
> >which will map in the snapshot, return the mapped address and the size 
> >(and if you want to support snapshots > 4GB, be my guest, but I suspect 
> >you're actually *better* off just admitting that if you cannot shrink 
> >the snapshot to less than 32 bits, it's not worth doing)
 
On Fri, 27 Apr 2007, Nigel Cunningham wrote:
> That inherently limits the image to half of available ram (you need
> somewhere to store the snapshot), so you won't get the full image you
> express interest in below.

It doesn't. We can make the userspace mapped pages copy-on-write. As long 
as the userspace makes sure there's not much activity during 
snapshot/shutdown, we will be fine. What we probably do need to copy is 
kernel pages.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] fix hrtimers documentation

2007-04-26 Thread Satoru Takeuchi

Fix hrtimers documentation.

- The word `patch' is used to indicate hrtimers subsystem implementation.
  However this subsystem has already been in ustream kernel, so convert
  them into appropriate phrases.
- Some minor cleanups.

Signed-off-by: Satoru Takeuchi <[EMAIL PROTECTED]>

Index: linux-2.6.21/Documentation/hrtimers/hrtimers.txt
===
--- linux-2.6.21.orig/Documentation/hrtimers/hrtimers.txt   2007-04-27 
12:45:56.0 +0900
+++ linux-2.6.21/Documentation/hrtimers/hrtimers.txt2007-04-27 
13:24:22.0 +0900
@@ -1,8 +1,9 @@
 
 hrtimers - subsystem for high-resolution kernel timers
-
+--
 
-This patch introduces a new subsystem for high-resolution kernel timers.
+This document describes hrtimers, a new subsystem for high-resolution
+kernel timers.
 
 One might ask the question: we already have a timer subsystem
 (kernel/timers.c), why do we need two timer subsystems? After a lot of
@@ -78,7 +79,7 @@ from 250 Hz to 100 HZ (or even smaller).
 hrtimer subsystem implementation details
 
 
-the basic design considerations were:
+The basic design considerations were:
 
 - simplicity
 
@@ -87,7 +88,7 @@ the basic design considerations were:
 
 - simplification of existing, timing related kernel code
 
-another basic requirement was the immediate enqueueing and ordering of
+Another basic requirement was the immediate enqueueing and ordering of
 timers at activation time. After looking at several possible solutions
 such as radix trees and hashes, we chose the red black tree as the basic
 data structure. Rbtrees are available as a library in the kernel and are
@@ -138,23 +139,22 @@ details.)
 hrtimers - rounding of timer values
 ---
 
-the hrtimer code will round timer events to lower-resolution clocks
+The hrtimer code will round timer events to lower-resolution clocks
 because it has to. Otherwise it will do no artificial rounding at all.
 
-one question is, what resolution value should be returned to the user by
+One question is, what resolution value should be returned to the user by
 the clock_getres() interface. This will return whatever real resolution
 a given clock has - be it low-res, high-res, or artificially-low-res.
 
 hrtimers - testing and verification
---
+---
 
-We used the high-resolution clock subsystem ontop of hrtimers to verify
+We used the high-resolution clock subsystem on top of hrtimers to verify
 the hrtimer implementation details in praxis, and we also ran the posix
 timer tests in order to ensure specification compliance. We also ran
 tests on low-resolution clocks.
 
-The hrtimer patch converts the following kernel functionality to use
-hrtimers:
+hrtimers converts the following kernel functionality to use it:
 
  - nanosleep
  - itimers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: MAINTAINERS file out of date?

2007-04-26 Thread Eric W. Biederman

WANG Cong <[EMAIL PROTECTED]> writes:

>
> Thanks for your point. 
> I know little about virtualization, maybe can't help much. But I am interested
> in other things you mentioned. AFAIK, segments can't be avoided on i386, and
> Linux uses them very little, how are they recalculated constantly?

Look in arch/i386/boot/setup.S it runs in 16bit mode.  We are talking about
real mode segments not 16bit segments.

16bit real mode is a completely different ball game, and why we keep BIOS
calls isolated to that one dinky file.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread David Chinner

On Thu, Apr 26, 2007 at 07:53:57PM -0700, Andrew Morton wrote:
> On Fri, 27 Apr 2007 12:27:31 +1000 David Chinner <[EMAIL PROTECTED]> wrote:
> > On Thu, Apr 26, 2007 at 07:04:38PM -0700, Andrew Morton wrote:
> > > On Tue, 24 Apr 2007 15:21:05 -0700 [EMAIL PROTECTED] wrote:
> > > Also, afaict your important requirements would be met by retaining
> > > PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by
> > > physically contiguous pages
> > 
> > Sure, that addresses the larger I/O side of things, but it doesn't address
> > the large filesystem blocksize issues that can only be solved with some kind
> > of page aggregation abstraction.
> 
> a) That wasn't a part of Christoph's original rationale list, so forgive
>me for thinking it is not so important and got snuck in post-facto when
>things got tough.

I've been pushing christoph to do something like this for more than a year
purely so we can support large block sizes in XFS. He's got other reasons
for wanting to do this, but that doesn't mean that the large filesystem
blocksize issue is any less important.

>blocksizes via this scheme - instantiate and lock four pages and go for
>it.

So now how do you get block aligned writeback? Or make sure that truncate
doesn't race on a partial *block* truncate? You basically have to
jump through nasty, nasty hoops, to handle corner cases that are introduced
because the generic code can no longer reliably lock out access to a
filesystem block.

Eventually you end up with something like fs/xfs/linux-2.6/xfs_buf.c and
doing everything inside the filesystem because it's the only way sane
way to serialise access to these aggregated structures. This is
the way XFS used to work in it's data path, and we all know how long
and loud people complained about that.

A filesystem specific aggregation mechanism is not a palatable solution
here because it drives filesystems away from being able to use generic
code. 

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] unprivileged mounts update

2007-04-26 Thread Eric W. Biederman

Miklos Szeredi <[EMAIL PROTECTED]> writes:

>> On Apr 25 2007 11:21, Eric W. Biederman wrote:
>> >>
>> >> Why did we want to use fsuid, exactly?
>> >
>> >- Because ruid is completely the wrong thing we want mounts owned
>> >  by whomever's permissions we are using to perform the mount.
>> 
>> Think nfs. I access some nfs file as an unprivileged user. knfsd, by
>> nature, would run as euid=0, uid=0, but it needs fsuid=jengelh for
>> most permission logic to work as expected.
>
> I don't think knfsd will ever want to call mount(2).
>
> But yeah, I've been convinced, that using fsuid is the right thing to
> do.

Actually knfsd does call mount when it crosses a mount point on the nfs
server it generates an equivalent mount point in linux.  At least I think
that is the what it is doing.  It is very similar to our mount propagation
path.

However as a special case I don't think the permission checking is likely
to bite us there.  It is worth double checking once we have the other details
ironed out.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux-2.6.21 hangs during post boot initialization phase

2007-04-26 Thread Peter Williams


Linus Torvalds wrote:


On Fri, 27 Apr 2007, Peter Williams wrote:

The 2.6.21 kernel is hanging during the post boot phase where various daemons
are being started (not always the same daemon unfortunately).

This problem was not present in 2.6.21-rc7 and there is no oops or other
unusual output in the system log at the time the hang occurs.


Can you use "git bisect" to narrow it down a bit more? It's only 125 
commits, so bisecting even just three or four kernels will narrow it down 
to a handful


Yes.  I'm just in the process of reading up on how to do the bisecting 
now.  Should have an answer in a few hours, I guess.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1

2007-04-26 Thread Andrew Morton

On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <[EMAIL PROTECTED]> wrote:

>   Apr 26 11:09:34 jupiter2 RTNL: assertion failed at 
> net/ipv4/devinet.c
>   (1055) Apr 26 11:09:34 jupiter2
>   Apr 26 11:09:34 jupiter2 Call Trace:
>   Apr 26 11:09:34 jupiter2   [] 
> inetdev_event+0x48/0x283
>   Apr 26 11:09:34 jupiter2 [] 
> _spin_lock_bh+0x9/0x19
>   Apr 26 11:09:34 jupiter2 [] 
> rt_run_flush+0x7e/0xaf
>   Apr 26 11:09:34 jupiter2 [] 
> notifier_call_chain+0x29/0x56 
>   Apr 26 11:09:34 jupiter2 [] 
> dev_set_mac_address+0x53/0x59 
>   Apr 26 11:09:34 jupiter2 [] 
> bo

This was due to locking bustage in the net tree.  It should be fixed in 
2.6.21-rc7-mm2.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21

2007-04-26 Thread Mike Galbraith

On Thu, 2007-04-26 at 21:02 +0200, Willy Tarreau wrote:
> On Thu, Apr 26, 2007 at 09:40:26AM -0700, Linus Torvalds wrote:
> 
> > So we should have somebody like Christoph running -mm, and when things 
> > break, we'll just sic Christoph on whoever broke it, and teach people 
> > proper fear and respect!
> 
> And with Al Viro doing random code review and fill in the commits for
> regression fixes, even long established developers will check their code
> twice before submitting ;-)

Yeah!  We can call them The Black And Blues Brothers :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread Eric W. Biederman

Jens Axboe <[EMAIL PROTECTED]> writes:

> On Thu, Apr 26 2007, Eric W. Biederman wrote:
>
> Yep, if you could just have > PAGE_CACHE_SIZE blocks in the filesystem
> easily, the problem would basically be solved for cd and dvd packet
> writing.

Ok.  I'm not in a position to do this work.  But I will keep it in
mind and look at it.

>> Am I correct in assuming that the problem is primarily about getting
>> filesystems (and other upper layers) to submit BIOs that take into
>> consideration the larger block size of the underlying device, so
>> that read/modify write is not needed in the pktcdvd layer?
>
> Yes, that is exactly the problem. Once you have that, pktcdvd is pretty
> much reduced to setup and init code, the actual data handling can be
> done by sr or ide-cd directly. You could merge it into cdrom.c, it would
> not be very different from mt-rainier handling (which basically does RMW
> in firmware, so it works for any write, but performance is of course
> horrible if you don't do it right).

Thanks for the clarification.

So we do have a clear problem that we do not have generic support for
large sector sizes residing in the page cache.

There is one place where this is a direct effect fs/block_dev.c

We have an indirect affect in the filesystems because there a few
bits of generic support missing and there is no linux convention
on how to handle this case.

I expect if we can enhance fs/block_dev.c to handle this case the
other parts will fall out naturally.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Strange problem with Device Mapper

2007-04-26 Thread Andrew Morton

On Thu, 26 Apr 2007 19:12:27 +0200 "Paul Rolland" <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> I've a machine that has been installed with Intel Soft Raid on top
> of 2 SATA disks.
> I'm trying to have this work as a RAID-1 array. 
> Bios configuration has been done, using 128K chunk, and the kernel
> (2.6.20.7) sees perfectly /dev/mapper/isw__RAID1
> 
> But, I'm facing two problems :
> 1 - If i try to create partitions on this device, it does fail (the 
> values are not interpreted correctly)
> 
> 2 - To avoid 1), I stop the RAID array (dmraid -an), then I do create
> exactly the same partition set on /dev/sda, and /dev/sdb, and
> then I reactivate RAID (dmraid -ay).
> This allows me to see all the /dev/mapper/isw_xxx_RAID1p1, p2, ...
> But, running fsck -t ext2 /dev/mapper/isw_xxx_RAID1p1
> ends in a lock when the partition is larger than 10Go (well, it is
> Ok on the 10Go one, and it locks on the 100Go).
> Not a real hardlock, I still can switch to a new VC, but it's not
> more possible to start a new command or to stop fsck.
> 
> Should I do everything on the physical disks before activating RAID ?
> Is it a normal behavior ?
> 
> BTW, I also noticed that such a device doesn't support BLKRRPART ioctl...
> So it is possible that what I'm doing is wrong...
> 
> Any idea ?
> 

cc added.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v6

2007-04-26 Thread Mike Galbraith

On Thu, 2007-04-26 at 22:09 +0200, Kasper Sandberg wrote:
> On Thu, 2007-04-26 at 10:41 -0400, Gene Heskett wrote:
> 
> > 
> > Compared to mainline?  I still think this is a 100% keeper for desktop 
> > users 
> > like me.
> 
> Here its alot worse, just playing an ogg with ogg123 even without
> anything reniced (X is 0), just pressing a link in konqueror can make
> audio skip (ogg123 fails to fill the alsa buffer, and thus it skips).

What arguments are you passing to ogg123?  I'm creating some ogg files
right now, and would like to reproduce these skips to see why they're
happening if possible.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

"REPORT: sd-0.46 vs cfs-v6 vs mainline 2.6.21-rc7 Beryl + Video + Audio"

2007-04-26 Thread hechacker1


"REPORT: sd-0.46 vs cfs-v6 vs mainline 2.6.21-rc7 Beryl + Video + Audio"

Hardware:
Dell Inspiron 700m laptop
1.7GHz Pentium M (Dothan 2M cache)
2GB RAM
1000Hz
Gentoo Linux
dyn-tick
700m # cat /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate
1 (microseconds, 10ms)
855gm integrated video/chipset
xf86-video-i810 (intel 1.7.4) DRI enabled
xorg-server-1.2.0-r3
beryl-core 0.3.0-svn
MPlayer dev-SVN-rUNKNOWN-4.1.2 - x11
Gnome totem 2.16.5 - x11-gstreamer
reiser4 w/cryptcompress

Screenshot:
http://ordorica.org/misc/beryl.png

muine playing mp3's off mounted windows share

Tests run under 16 bit color which provides a constant 75 fps
on one cube side (fps forced limited). Drops to ~45-50 fps during
animation/rotate/scale (depending on complexity of rendering)
Vsync off. 75Hz refresh 1280x800.

totem running fullscreen playing 700MB divx "An Inconvenient Truth.avi" on
one side of cube/desktop
gmplayer running fullscreen on another cube side (same file).

The given observations/numbers are when I move the cube with my mouse
and view two faces at one time (see screenshot). One face is playing the
totem video, the other containing my terminals.


Some numbers I've seen other people throw around:
I don't know their relevance.

cfs-v6:
700m kernel # cat sched_granularity_ns
500
procs ---memory-- ---swap-- -io -system-- cpu
r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
5  0  0 221480300 139461200   181 0 6068 5317 69  6 25  0
4  0  0 220880300 139526800   176 0 6147 5579 68  6 27  0
1  0  0 220340300 139576800   167 0 6052 5393 70  6 24  0
6  0  0 219920300 139620400   103 0 5830 5211 73  6 21  0

top - 18:31:17 up  7:45,  5 users,  load average: 5.18, 4.73, 4.28
Tasks:  98 total,   4 running,  94 sleeping,   0 stopped,   0 zombie
Cpu(s): 91.6%us,  6.4%sy,  0.0%ni,  0.3%id,  0.0%wa,  1.3%hi,  0.3%si,  0.0%st
Mem:   2057700k total,  1845952k used,   211748k free,  300k buffers
Swap:   987988k total,0k used,   987988k free,  1404040k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
18046 hechacke  20   0  189m  83m  20m S 38.7  4.2  12:04.64 totem
18059 hechacke  20   0 51280  30m  18m R 25.8  1.5   9:47.36 gmplayer
12117 root  20   0  275m  54m  18m R 20.2  2.7  15:18.38 Xorg
22730 hechacke  20   0  119m  35m  18m R  5.3  1.7   0:12.68 mono
12350 hechacke  20   0 63820 6776 4328 S  3.6  0.3   2:20.36 beryl
16465 hechacke  20   0 43960  15m  10m S  2.3  0.8   0:07.14 gnome-terminal
12200 hechacke  20   0  5308 4016 1740 S  0.3  0.2   0:05.45 gconfd-2
12215 hechacke  20   0 38704 8956 7588 S  0.3  0.4   0:08.90 xfce4-clipman-p

Observation:
Music plays perfectly.
Audio of video's play perfectly.
New processes take forever to start. Firefox (already cached in ram) takes
about 5 seconds to start; even right after closing it.
Browsing the web is slow.
Already open applications are responsive.
Behavior of video:
video's both moving forward. totem is updating about every half second.
mplayer updates about every 3 seconds.

-

cfs-v6:
700m kernel # cat sched_granularity_ns
200
procs ---memory-- ---swap-- -io -system-- cpu
r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
5  0  0  99604 44 151936400 0 0 3903 5575 91  5  5  0
3  0  0  99512 44 151936400 0 0 5990 6783 72  5 23  0
3  0  0 100412 44 151936400 0 0 6858 7261 67  5 28  0
1  0  0 100412 44 151936400 0 0 7426 7634 62  4 34  0
4  0  0 100288 44 151936400 0 0 7039 7442 60  6 34  0

top - 19:05:09 up  8:18,  5 users,  load average: 3.62, 4.16, 4.28
Tasks:  98 total,   4 running,  94 sleeping,   0 stopped,   0 zombie
Cpu(s): 69.8%us,  5.0%sy,  0.0%ni, 24.5%id,  0.0%wa,  0.7%hi,  0.0%si,  0.0%st
Mem:   2057700k total,  2009396k used,48304k free,  300k buffers
Swap:   987988k total,0k used,   987988k free,  1555428k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
18059 hechacke  20   0 51332  30m  18m R 30.8  1.5  18:48.17 gmplayer
18046 hechacke  20   0  189m  83m  20m S 20.9  4.2  23:25.49 totem
12117 root  20   0  276m  57m  18m S  9.6  2.8  20:59.01 Xorg
22730 hechacke  20   0  129m  36m  18m R  8.6  1.8   1:28.59 mono
22930 hechacke  20   0 65480 8392 4320 S  4.0  0.4   0:53.38 beryl
12213 hechacke  20   0 34472 7680 6484 S  0.7  0.4   1:16.41 xfce4-battery-p

Observation:
Music plays perfectly.
Audio of video's play perfectly.
New processes take forever to start.
Browsing the web is slow.
Already open applications are responsive.
Behavior of video:
video's both moving forward. totem is updating about every 1/3 second.
mplayer updates about every 1 seconds.
Rotating the

Re: [PATCH -mm] consolidate asm/const.h to linux/const.h

2007-04-26 Thread David Miller

From: Randy Dunlap <[EMAIL PROTECTED]>
Date: Thu, 26 Apr 2007 20:50:08 -0700

> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> Make a global linux/const.h header file instead of having multiple,
> per-arch files, and convert current users of asm/const.h to use
> linux/const.h.
> 
> Built on x86_64 and sparc64.
> 
> Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] IBM PPC EMAC driver:improved support for PHY, resending

2007-04-26 Thread Benjamin Herrenschmidt

On Thu, 2007-04-26 at 18:18 -0700, Jeff Haran wrote:
> From: Jeff Haran  <[EMAIL PROTECTED]>
> 
> Resending with Outlook patch mangling hopefully corrected (Maybe I
> should write a HOWTO, this was harder than fixing the driver).

Note, sorry about that, still mangled :-(

Just send it as an attachment ... and get yourself a linux desktop :-)

Cheers,
Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -mm] consolidate asm/const.h to linux/const.h

2007-04-26 Thread Randy Dunlap

From: Randy Dunlap <[EMAIL PROTECTED]>

Make a global linux/const.h header file instead of having multiple,
per-arch files, and convert current users of asm/const.h to use
linux/const.h.

Built on x86_64 and sparc64.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 include/asm-sparc64/Kbuild|1 -
 include/asm-sparc64/const.h   |   19 ---
 include/asm-sparc64/lsu.h |2 +-
 include/asm-sparc64/mmu.h |2 +-
 include/asm-sparc64/page.h|2 +-
 include/asm-sparc64/pgtable.h |2 +-
 include/asm-sparc64/pstate.h  |2 +-
 include/asm-sparc64/sfafsr.h  |2 +-
 include/asm-x86_64/const.h|   20 
 include/asm-x86_64/page.h |2 +-
 include/asm-x86_64/pgtable.h  |2 +-
 include/linux/Kbuild  |1 +
 include/linux/const.h |   19 +++
 13 files changed, 28 insertions(+), 48 deletions(-)

--- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/const.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/* const.h: Macros for dealing with constants.  */
-
-#ifndef _SPARC64_CONST_H
-#define _SPARC64_CONST_H
-
-/* Some constant macros are used in both assembler and
- * C code.  Therefore we cannot annotate them always with
- * 'UL' and other type specificers unilaterally.  We
- * use the following macros to deal with this.
- */
-
-#ifdef __ASSEMBLY__
-#define _AC(X,Y)   X
-#else
-#define _AC(X,Y)   (X##Y)
-#endif
-
-
-#endif /* !(_SPARC64_CONST_H) */
--- linux-2.6.21-rc7-mm2.orig/include/asm-x86_64/const.h
+++ /dev/null
@@ -1,20 +0,0 @@
-/* const.h: Macros for dealing with constants.  */
-
-#ifndef _X86_64_CONST_H
-#define _X86_64_CONST_H
-
-/* Some constant macros are used in both assembler and
- * C code.  Therefore we cannot annotate them always with
- * 'UL' and other type specificers unilaterally.  We
- * use the following macros to deal with this.
- */
-
-#ifdef __ASSEMBLY__
-#define _AC(X,Y)   X
-#else
-#define __AC(X,Y)  (X##Y)
-#define _AC(X,Y)   __AC(X,Y)
-#endif
-
-
-#endif /* !(_X86_64_CONST_H) */
--- /dev/null
+++ linux-2.6.21-rc7-mm2/include/linux/const.h
@@ -0,0 +1,19 @@
+/* const.h: Macros for dealing with constants.  */
+
+#ifndef _LINUX_CONST_H
+#define _LINUX_CONST_H
+
+/* Some constant macros are used in both assembler and
+ * C code.  Therefore we cannot annotate them always with
+ * 'UL' and other type specifiers unilaterally.  We
+ * use the following macros to deal with this.
+ */
+
+#ifdef __ASSEMBLY__
+#define _AC(X,Y)   X
+#else
+#define __AC(X,Y)  (X##Y)
+#define _AC(X,Y)   __AC(X,Y)
+#endif
+
+#endif /* !(_LINUX_CONST_H) */
--- linux-2.6.21-rc7-mm2.orig/include/asm-x86_64/page.h
+++ linux-2.6.21-rc7-mm2/include/asm-x86_64/page.h
@@ -1,7 +1,7 @@
 #ifndef _X86_64_PAGE_H
 #define _X86_64_PAGE_H
 
-#include 
+#include 
 
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT 12
--- linux-2.6.21-rc7-mm2.orig/include/asm-x86_64/pgtable.h
+++ linux-2.6.21-rc7-mm2/include/asm-x86_64/pgtable.h
@@ -1,7 +1,7 @@
 #ifndef _X86_64_PGTABLE_H
 #define _X86_64_PGTABLE_H
 
-#include 
+#include 
 #ifndef __ASSEMBLY__
 
 /*
--- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/lsu.h
+++ linux-2.6.21-rc7-mm2/include/asm-sparc64/lsu.h
@@ -2,7 +2,7 @@
 #ifndef _SPARC64_LSU_H
 #define _SPARC64_LSU_H
 
-#include 
+#include 
 
 /* LSU Control Register */
 #define LSU_CONTROL_PM _AC(0x01fe,UL) /* Phys-watchpoint byte 
mask*/
--- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/mmu.h
+++ linux-2.6.21-rc7-mm2/include/asm-sparc64/mmu.h
@@ -1,8 +1,8 @@
 #ifndef __MMU_H
 #define __MMU_H
 
+#include 
 #include 
-#include 
 #include 
 
 #define CTX_NR_BITS13
--- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/page.h
+++ linux-2.6.21-rc7-mm2/include/asm-sparc64/page.h
@@ -5,7 +5,7 @@
 
 #ifdef __KERNEL__
 
-#include 
+#include 
 
 #if defined(CONFIG_SPARC64_PAGE_SIZE_8KB)
 #define PAGE_SHIFT   13
--- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/pgtable.h
+++ linux-2.6.21-rc7-mm2/include/asm-sparc64/pgtable.h
@@ -15,13 +15,13 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 
 /* The kernel image occupies 0x400 to 0x100 (4MB --> 32MB).
  * The page copy blockops can use 0x200 to 0x400.
--- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/pstate.h
+++ linux-2.6.21-rc7-mm2/include/asm-sparc64/pstate.h
@@ -2,7 +2,7 @@
 #ifndef _SPARC64_PSTATE_H
 #define _SPARC64_PSTATE_H
 
-#include 
+#include 
 
 /* The V9 PSTATE Register (with SpitFire extensions).
  *
--- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/sfafsr.h
+++ linux-2.6.21-rc7-mm2/include/asm-sparc64/sfafsr.h
@@ -1,7 +1,7 @@
 #ifndef _SPARC64_SFAFSR_H
 #define _SPARC64_SFAFSR_H
 
-#include 
+#include 
 
 /* Spitfire Asynchronous Fault Status register, ASI=0x4C VA<63:0>=0x0 */
 
--- linux-2.6.21-rc7-mm2.orig/include/asm-sparc64/Kbuild
+++ linux-2.6.21-rc7-mm2/include/asm-sparc64/Kbuild
@@ -8,7 +8,6 @@ header-y += apb.h
 header-y += asi.h
 header-y

Re: [00/17] Large Blocksize Support V3 (mmap conceptual discussion)

2007-04-26 Thread Christoph Lameter

On Thu, 26 Apr 2007, Andrew Morton wrote:

> > Sure, that addresses the larger I/O side of things, but it doesn't address
> > the large filesystem blocksize issues that can only be solved with some kind
> > of page aggregation abstraction.
> 
> a) That wasn't a part of Christoph's original rationale list, so forgive
>me for thinking it is not so important and got snuck in post-facto when
>things got tough.

It was definitely part of my thinking. I never thought anyone could do it 
differently so I did not emphasize it.

> b) I don't immediately see why a filesystam cannot implement larger
>blocksizes via this scheme - instantiate and lock four pages and go for
>it.
> 
> > Compound pages and high order page cache
> > indexing solves this extremely neatly, regardless of whether the compound
> > page is contiguous or not.
> 
> We cannot say anything about neatness until we've seen mmap.

Rough Draft was posted at 
http://marc.info/?l=linux-kernel=117709695522443=2
http://marc.info/?l=linux-kernel=117709215016822=2
http://marc.info/?l=linux-kernel=117709238129124=2

Basically 4k mmap semantics are preserved. One can mmap any 4k section of 
a compound page. state information is kept in the head page. So we have two
page struct pointers to juggle

1. The one pointing to the page for address calculations COW etc.

2. The one pointing to the head page for state information.

For each 4k pointer from a process to a compound page we would 
have to take a refcount.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux-2.6.21 hangs during post boot initialization phase

2007-04-26 Thread Linus Torvalds



On Fri, 27 Apr 2007, Peter Williams wrote:
>
> The 2.6.21 kernel is hanging during the post boot phase where various daemons
> are being started (not always the same daemon unfortunately).
> 
> This problem was not present in 2.6.21-rc7 and there is no oops or other
> unusual output in the system log at the time the hang occurs.

Can you use "git bisect" to narrow it down a bit more? It's only 125 
commits, so bisecting even just three or four kernels will narrow it down 
to a handful.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] rtc-cmos: make it load on PNPBIOS systems

2007-04-26 Thread David Brownell

On Thursday 26 April 2007, Marko Vrh wrote:
> Replace CONFIG_PNPACPI with CONFIG_PNP, so it loads on
> ACPI-less PNPBIOS systems.
> 
> Signed-off-by: Marko Vrh <[EMAIL PROTECTED]>

Acked-by: David Brownell <[EMAIL PROTECTED]>

> 
> 
> 
> diff -urNp linux-2.6.21/drivers/rtc/rtc-cmos.c 
> linux-2.6.21.new/drivers/rtc/rtc-cmos.c
> --- linux-2.6.21/drivers/rtc/rtc-cmos.c   2007-04-26 05:08:32.0 
> +0200
> +++ linux-2.6.21.new/drivers/rtc/rtc-cmos.c   2007-04-26 23:58:36.0 
> +0200
> @@ -379,12 +379,12 @@ static irqreturn_t cmos_interrupt(int ir
>   return IRQ_NONE;
>   }
> 
> -#ifdef   CONFIG_PNPACPI
> -#define  is_pnpacpi()1
> +#ifdef   CONFIG_PNP
> +#define  is_pnp()1
>   #define INITSECTION
> 
>   #else
> -#define  is_pnpacpi()0
> +#define  is_pnp()0
>   #define INITSECTION __init
>   #endif
> 
> @@ -427,7 +427,7 @@ cmos_do_probe(struct device *dev, struct
>* REVISIT for non-x86 systems we may need to handle io memory
>* resources: ioremap them, and request_mem_region().
>*/
> - if (is_pnpacpi()) {
> + if (is_pnp()) {
>   retval = request_resource(_resource, ports);
>   if (retval < 0) {
>   dev_dbg(dev, "i/o registers already in use\n");
> @@ -520,7 +520,7 @@ static void __exit cmos_do_remove(struct
> 
>   cmos_do_shutdown();
> 
> - if (is_pnpacpi())
> + if (is_pnp())
>   release_resource(cmos->iomem);
>   rename_region(cmos->iomem, NULL);
> 
> @@ -613,7 +613,7 @@ static int cmos_resume(struct device *de
>* the device node will always be created as a PNPACPI device.
>*/
> 
> -#ifdef   CONFIG_PNPACPI
> +#ifdef   CONFIG_PNP
> 
>   #include 
> 
> @@ -684,11 +684,11 @@ static void __exit cmos_exit(void)
>   }
>   module_exit(cmos_exit);
> 
> -#else/* no PNPACPI */
> +#else/* no PNP */
> 
>   /**/
> 
> -/* Platform setup should have set up an RTC device, when PNPACPI is
> +/* Platform setup should have set up an RTC device, when PNP is
>* unavailable ... this could happen even on (older) PCs.
>*/
> 
> @@ -734,7 +734,7 @@ static void __exit cmos_exit(void)
>   module_exit(cmos_exit);
> 
> 
> -#endif   /* !PNPACPI */
> +#endif   /* !PNP */
> 
>   MODULE_AUTHOR("David Brownell");
>   MODULE_DESCRIPTION("Driver for PC-style 'CMOS' RTCs");
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.22 5/5] iw_cxgb3: Update required firmware revision to 4.0.0.

2007-04-26 Thread Roland Dreier

 > Update required firmware revision to 4.0.0.

Hmm... should we fold this into the earlier patch, which actually
needs this new FW?  Or at least merge this patch first?

Also, is it cool with everyone to require a new FW, even for users who
might not be using (or even building) the RDMA driver?  I'm not sure
what a good solution would be really, so maybe the pain of forcing
everyone to update FW is the least bad thing to do.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] v3: utimensat implementation

2007-04-26 Thread Ulrich Drepper

One comment I got suggested to clean up the overflow tests.  There
is no reason to not perform the full micro-second overflow test
in those two places.  The four conditions are reduced by the compiler
to just two tests so there's no problems with performance.

Signed-off-by: Ulrich Drepper <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index 796df69..12611c8 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -714,9 +714,10 @@ ia32_sys_call_table:
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
.quad sys_epoll_pwait
+   .quad compat_sys_utimensat  /* 320 */
 ia32_syscall_end:  
diff --git a/fs/compat.c b/fs/compat.c
index 040a8be..d8a8cb5 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -79,28 +79,55 @@ int compat_printk(const char *fmt, ...)
  */
 asmlinkage long compat_sys_utime(char __user *filename, struct compat_utimbuf 
__user *t)
 {
-   struct timeval tv[2];
+   struct timespec tv[2];
 
if (t) {
if (get_user(tv[0].tv_sec, >actime) ||
get_user(tv[1].tv_sec, >modtime))
return -EFAULT;
-   tv[0].tv_usec = 0;
-   tv[1].tv_usec = 0;
+   tv[0].tv_nsec = 0;
+   tv[1].tv_nsec = 0;
}
return do_utimes(AT_FDCWD, filename, t ? tv : NULL);
 }
 
+asmlinkage long compat_sys_utimensat(unsigned int dfd, char __user *filename, 
struct compat_timespec __user *t)
+{
+   struct timespec tv[2];
+
+   if  (t) {
+   if (get_compat_timespec([0], [0]) ||
+   get_compat_timespec([1], [1]))
+   return -EFAULT;
+
+   if ((tv[0].tv_nsec == UTIME_OMIT || tv[0].tv_nsec == UTIME_NOW)
+   && tv[0].tv_sec != 0)
+   return -EINVAL;
+   if ((tv[1].tv_nsec == UTIME_OMIT || tv[1].tv_nsec == UTIME_NOW)
+   && tv[1].tv_sec != 0)
+   return -EINVAL;
+
+   if (tv[0].tv_nsec == UTIME_OMIT && tv[1].tv_nsec == UTIME_OMIT)
+   return 0;
+   }
+   return do_utimes(dfd, filename, t ? tv : NULL);
+}
+
 asmlinkage long compat_sys_futimesat(unsigned int dfd, char __user *filename, 
struct compat_timeval __user *t)
 {
-   struct timeval tv[2];
+   struct timespec tv[2];
 
if (t) {
if (get_user(tv[0].tv_sec, [0].tv_sec) ||
-   get_user(tv[0].tv_usec, [0].tv_usec) ||
+   get_user(tv[0].tv_nsec, [0].tv_usec) ||
get_user(tv[1].tv_sec, [1].tv_sec) ||
-   get_user(tv[1].tv_usec, [1].tv_usec))
+   get_user(tv[1].tv_nsec, [1].tv_usec))
return -EFAULT;
+   if (tv[0].tv_nsec >= 100 || tv[0].tv_nsec < 0 ||
+   tv[1].tv_nsec >= 100 || tv[1].tv_nsec < 0)
+   return -EINVAL;
+   tv[0].tv_nsec *= 1000;
+   tv[1].tv_nsec *= 1000;
}
return do_utimes(dfd, filename, t ? tv : NULL);
 }
diff --git a/fs/utimes.c b/fs/utimes.c
index 99cf2cb..d854af7 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ out:
  * must be owner or have write permission.
  * Else, update from *times, must be owner or super user.
  */
-long do_utimes(int dfd, char __user *filename, struct timeval *times)
+long do_utimes(int dfd, char __user *filename, struct timespec *times)
 {
int error;
struct nameidata nd;
@@ -100,11 +101,21 @@ long do_utimes(int dfd, char __user *filename, struct 
timeval *times)
 if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
 goto dput_and_out;
 
-   newattrs.ia_atime.tv_sec = times[0].tv_sec;
-   newattrs.ia_atime.tv_nsec = times[0].tv_usec * 1000;
-   newattrs.ia_mtime.tv_sec = times[1].tv_sec;
-   newattrs.ia_mtime.tv_nsec = times[1].tv_usec * 1000;
-   newattrs.ia_valid |= ATTR_ATIME_SET | ATTR_MTIME_SET;
+   if (times[0].tv_nsec == UTIME_OMIT)
+   newattrs.ia_valid &= ~ATTR_ATIME;
+   else if (times[0].tv_nsec != UTIME_NOW) {
+   newattrs.ia_atime.tv_sec = times[0].tv_sec;
+   newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
+   newattrs.ia_valid |= ATTR_ATIME_SET;
+   }
+
+   if (times[1].tv_nsec == UTIME_OMIT)
+   newattrs.ia_valid &= ~ATTR_MTIME;
+   else if (times[1].tv_nsec != UTIME_NOW) {
+

[PATCH] (i386) fix wrong comment for syscall stack layout

2007-04-26 Thread Satoru Takeuchi

Fix wrong comment for syscall stack layout.

`ret_from_sys_call' label no longer exist and `syscall_exit' label was
introduced instead.

Signed-off-by: Satoru Takeuchi <[EMAIL PROTECTED]>

Index: linux-2.6.21/arch/i386/kernel/entry.S
===
--- linux-2.6.21.orig/arch/i386/kernel/entry.S  2007-04-27 11:45:46.0 
+0900
+++ linux-2.6.21/arch/i386/kernel/entry.S   2007-04-27 11:46:00.0 
+0900
@@ -15,7 +15,7 @@
  * I changed all the .align's to 4 (16 byte alignment), as that's faster
  * on a 486.
  *
- * Stack layout in 'ret_from_system_call':
+ * Stack layout in 'syscall_exit':
  * ptrace needs to have all regs on the stack.
  * if the order here is changed, it needs to be
  * updated in fork.c:copy_process, signal.c:do_signal,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread Andrew Morton

On Fri, 27 Apr 2007 12:27:31 +1000 David Chinner <[EMAIL PROTECTED]> wrote:

> On Thu, Apr 26, 2007 at 07:04:38PM -0700, Andrew Morton wrote:
> > On Tue, 24 Apr 2007 15:21:05 -0700 [EMAIL PROTECTED] wrote:
> > 
> > > This patchset modifies the Linux kernel so that larger block sizes than
> > > page size can be supported. Larger block sizes are handled by using
> > > compound pages of an arbitrary order for the page cache instead of
> > > single pages with order 0.
> > 
> > Something I was looking for but couldn't find: suppose an application takes
> > a pagefault against the third 4k page of an order-2 pagecache "page".  We
> > need to instantiate a pte against find_get_page(offset/4)+3.  But these
> > patches don't touch mm/memory.c at all and filemap_nopage() appears to
> > return the zeroeth 4k page all the time in that case.
> > 
> > So.. what am I missing, and how does that part work?
> 
> "mmap not supported yet" ;)

erk.  I suspect this will have its sticky paws all over core mm.

> > Also, afaict your important requirements would be met by retaining
> > PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by
> > physically contiguous pages
> 
> Sure, that addresses the larger I/O side of things, but it doesn't address
> the large filesystem blocksize issues that can only be solved with some kind
> of page aggregation abstraction.

a) That wasn't a part of Christoph's original rationale list, so forgive
   me for thinking it is not so important and got snuck in post-facto when
   things got tough.

b) I don't immediately see why a filesystam cannot implement larger
   blocksizes via this scheme - instantiate and lock four pages and go for
   it.

> Compound pages and high order page cache
> indexing solves this extremely neatly, regardless of whether the compound
> page is contiguous or not.

We cannot say anything about neatness until we've seen mmap.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc7-mm2 breaks 'lvm vgscan'.

2007-04-26 Thread Andrew Morton

On Thu, 26 Apr 2007 22:31:15 -0400 [EMAIL PROTECTED] wrote:

> On Wed, 25 Apr 2007 22:57:16 PDT, Andrew Morton said:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/
> 
> This addition in -rc7-mm1 breaks my laptop (Dell Latitude D820, x86_64 kernel)
> 
> gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch

Thanks for doing the bisection.  It is boring, but helps so much.

> Any ideas?

You cc'ed the right guy ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPORT] First "glitch1" results, 2.6.21-rc7-git6-CFSv5 + SD 0.46

2007-04-26 Thread Ed Tomlinson

On Thursday 26 April 2007 18:56, Con Kolivas wrote:
> On Friday 27 April 2007 08:00, Bill Davidsen wrote:
> > Ingo Molnar wrote:
> > > * Ed Tomlinson <[EMAIL PROTECTED]> wrote:
> > >>> SD 0.46 1-2 FPS
> > >>> cfs v5 nice -19 219-233 FPS
> > >>> cfs v5 nice 0   1000-1996
> > >>
> > >>cfs v5 nice -10  60-65 FPS
> > >
> > > the problem is, the glxgears portion of this test is an _inverse_
> > > testcase.
> > >
> > > The reason? glxgears on true 3D hardware will _not_ use X, it will
> > > directly use the 3D driver of the kernel. So by renicing X to -19 you
> > > give the xterms more chance to show stuff - the performance of the
> > > glxgears will 'degrade' - but that is what you asked for: glxgears is
> > > 'just another CPU hog' that competes with X, it's not a "true" X client.
> > >
> > > if you are after glxgears performance in this test then you'll get the
> > > best performance out of this by renicing X to +19 or even SCHED_BATCH.
> >
> > Several points on this...
> >
> > First, I don't think this is accelerated in the way you mean, the
> > machine is a test server, with motherboard video using the 945G video
> > driver. Given the limitations of the support in that setup, I don't
> > think it qualified as "true 3D hardware," although I guess I could try
> > using the vesafb version as a test.
> >
> > The 2nd thing I note is that on FC6 this scheduler seems to confuse
> > 'top' to some degree, since the glxgears is shown as taking 51% of the
> > CPU (one core), while the state breakdown shows about 73% in idle,
> > waitio, and int. image attached.
> 
> top by itself certainly cannot be trusted to give true representation of the 
> cpu usage I'm afraid. It's not as convoluted as, say, trying to track memory 
> usage of an application, but top's resolution being tied to HZ accounting 
> makes it not reliable in that regard.
> >
> > After I upgrade the kernel and cfs to the absolute latest I'll repeat
> > this, as well as test with vesafb, and my planned run under heavy load.
> 
> I have a problem with your test case Bill. Its behaviour would depend on how 
> gpu bound vs cpu bound vs accelerated vs non-accelerated your graphics card 
> is. I get completely different results to those of the other testers given 
> the different hardware configuration and I don't think my results are 
> valuable. My problem with this testcase is - What would you define 
> as "perfect" behaviour for your test case? It seems far too arbitrary.

Con,

One thing I did not mention in all this is that renicing the glxgears process 
to -10
gets SD to give about 1000FPS, indeed you get most of this performance at -5 
too.
All in all SD does a very good job here.

Get well soon!
Ed
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [mmc] alternative TI FM MMC/SD driver for 2.6.21-rc7

2007-04-26 Thread Alex Dubov

--- Pierre Ossman <[EMAIL PROTECTED]> wrote:

> Sergey Yanovich wrote:
> > 
> > I have found it easier to rewrite the driver, than to fix.
> 
> Before you get your hopes up, this development model is not one that will get
> your code merged upstream. You should really try to work with Alex, not side
> step him. Drivers are rarely complex enough to warrant, or even have room 
> for, a
> rewrite. And judging from your code it looks more like reorganising the code
> that's already there.

It is a sad truth. Instead of raising real issues that may remain in the 
driver, I was presented
with "non-proof" that bus-adapter-device architecture I'm using is somehow bad 
and the driver
should be turned into a monolithic blob, using config variables to disable 
unneeded functionality.
Considering, that udev handles automatic loading of the drivers just fine (so 
it's not an end user
issue at any rate), I don't see any justification for the change.

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.21-rc7-mm2 breaks 'lvm vgscan'.

2007-04-26 Thread Valdis . Kletnieks

On Wed, 25 Apr 2007 22:57:16 PDT, Andrew Morton said:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/

This addition in -rc7-mm1 breaks my laptop (Dell Latitude D820, x86_64 kernel)

gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch

The initrd on my system does an 'lvm vgscan' to get the root filesystem
accessible.  Under -rc5-mm2, this works fine.  For -rc7-mm[12], it finds the 
disk:

ata_piix :00:1f.2: version 2.11
ata_piix :00:1f.2: MAP [ P0 P2 IDE IDE ]
ACPI: PCI Interrupt :00:1f.2[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device :00:1f.2 to 64
scsi0 : ata_piix
scsi1 : ata_piix
ata1: SATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 
0x0001bfa0 irq 14
ata2: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 
0x0001bfa8 irq 15
ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
ata1.00: ATA-7: ST980825AS, 8.04, max UDMA/133
ata1.00: 156301488 sectors, multi 8: LBA48 NCQ (depth 0/32)
ata1.00: ata_hpa_resize 1: sectors = 156301488, hpa_sectors = 156301488
ata1.00: configured for UDMA/133
ata2.00: ATAPI, max UDMA/33
ata2.00: configured for UDMA/33
scsi 0:0:0:0: Direct-Access ATA  ST980825AS   8.04 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk

sda1 is an ext3 /boot, sda2 is an LVM space covering the rest of the disk, so
we're doing well so far.

The 'lvm vgscan' fails and says 'No volume groups found, with no useful kernel
messages issued.  Then we get the infamous "Kernel panic - not syncing:
Attempted to kill init!" when it can't find the root file system and we fall
off the end of the initrd.

Any ideas?


pgpnzy74nWH0d.pgp
Description: PGP signature

Re: MAINTAINERS file out of date?

2007-04-26 Thread WANG Cong

On Thu, Apr 26, 2007 at 06:12:43PM -0700, H. Peter Anvin wrote:
>WANG Cong wrote:
>>>
>>> I have considered myself as a rather unofficial maintainer of this code,
>>> and wouldn't mind make it official now when I actually have a job which
>>> both cares about and actually can support my upstream Linux activities,
>>> which was a major pain for a while.
>>>
>>> Overall, there is a lot of cleanup which really is needed in the i386
>>> boot process; I have done some work on it already, but more is needed.
>> 
>> Sounds interesting. Can you point me what needs to be done exactly? Maybe I 
>> can help you. ;)
>
>There was a long thread on the linux-virtualization list
>(@lists.linux-foundation.org) just a few days ago.  The biggest single
>issue right now is probably how we transition from the bootup page
>tables to the "real" init_mm page tables, but the real-mode code also
>needs a massive overhaul (especially so since this code can and should
>be shared with x86-64); in particular I really want to get rid of the
>insane segment handling, where segments are constantly recalculated for
>no good reason.
>
>For the real-mode stuff, I have some patches already in the works for
>this.  Eric W. Biederman has also done a lot of work in this area.
>
>   -hpa

Thanks for your point. 
I know little about virtualization, maybe can't help much. But I am interested 
in other things you mentioned. AFAIK, segments can't be avoided on i386, and 
Linux uses them very little, how are they recalculated constantly?

Regards!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread David Chinner

On Thu, Apr 26, 2007 at 07:04:38PM -0700, Andrew Morton wrote:
> On Tue, 24 Apr 2007 15:21:05 -0700 [EMAIL PROTECTED] wrote:
> 
> > This patchset modifies the Linux kernel so that larger block sizes than
> > page size can be supported. Larger block sizes are handled by using
> > compound pages of an arbitrary order for the page cache instead of
> > single pages with order 0.
> 
> Something I was looking for but couldn't find: suppose an application takes
> a pagefault against the third 4k page of an order-2 pagecache "page".  We
> need to instantiate a pte against find_get_page(offset/4)+3.  But these
> patches don't touch mm/memory.c at all and filemap_nopage() appears to
> return the zeroeth 4k page all the time in that case.
> 
> So.. what am I missing, and how does that part work?

"mmap not supported yet" ;)

> Also, afaict your important requirements would be met by retaining
> PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by
> physically contiguous pages

Sure, that addresses the larger I/O side of things, but it doesn't address
the large filesystem blocksize issues that can only be solved with some kind
of page aggregation abstraction. Compound pages and high order page cache
indexing solves this extremely neatly, regardless of whether the compound
page is contiguous or not.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] utimensat implementation

2007-04-26 Thread Ulrich Drepper

Neil Brown wrote:
> Does it also specify how to find out what granularity is used by the
> filesystem?  I had a need for this just recently and couldn't see any
> way to extract it.

That's still on the table.  We might end up with an fpathconf() solution.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature

Re: [patch] unprivileged mounts update

2007-04-26 Thread Serge E. Hallyn

Quoting Miklos Szeredi ([EMAIL PROTECTED]):
> > Quoting Miklos Szeredi ([EMAIL PROTECTED]):
> > > > So then as far as you're concerned, the patches which were in -mm will
> > > > remain unchanged?
> > > 
> > > Basically yes. I've merged the update patch, which was not yet added
> > > to -mm, did some cosmetic code changes, and updated the patch headers.
> > > 
> > > There's one open point, that I think we haven't really explored, and
> > > that is the propagation semantics.  I think you had the idea, that a
> > > propagated mount should inherit ownership from the parent into which
> > > it was propagated.
> > 
> > Don't think that was me.  I stayed out of those early discussions
> > because I wasn't comfortable guessing at the proper semantics yet.
> 
> Yes, sorry, it was Eric's suggestion.
> 
> > But really, I, as admin, have to set up both propagation and user mounts
> > for a particular subtree, so why would I *not* want user mounts to be
> > propagated?
> > 
> > So, in my own situation, I have done
> > 
> > make / rshared
> > mount --bind /share /share
> > make /share unbindable
> > for u in $users; do
> > mount --rbind / /share/$u/root
> > make /share/$u/root rslave
> > make /share/$u/root rshared
> > mount --bind -o user=$u /share/$u/root/home/$u 
> > /share/$u/root/home/$u
> > done
> > 
> > All users get chrooted into /share/$USER/root, some also get their own
> > namespace.  Clearly if a user in a new namespace does
> > 
> > mount --bind -o user=me ~/somedir ~/otherdir
> > 
> > then logs out, and logs back in, I want the ~/otherdir in the new
> > namespace (and the one in the 'init' namespace) to also be owned by
> > 'me'.
> > 
> > > That sounds good if everyone agrees?
> > 
> > I've shown where I think propagating the mount owner is useful.  Can you
> > detail a scenario where doing so would be bad?  Then we can work toward
> > semantics that make sense...
> 
> But in your example, the "propagated mount inherits ownership from
> parent mount" would also work, since in all namespaces the owner of
> the parent would necessary be "me".

true.

> The "inherits parent" semantics would work better for example in the
> "all nosuid" namespace, where the user is free to modify it's mount
> namespace. 
> 
> If for example propagation is set up from the initial namespace to
> this user's namespace and a new mount is added to the initial
> namespace, it would be nice if the propagated new mount would also be
> owned by the user (and be "nosuid" of course).

ok, so in the example i gave, this would be the admin in the
initial namespace mounting something under /home/$USER/, which
gets propagated to slave /share/$USER/root/home/$USER, where
we would want a different mount owner.

> Does the above make sense?  I'm not sure I've explained clearly
> enough.

I think I see.  Sounds like inherit from parent does the right thing
all around, at least in cases we've thought of so far.

thanks,
-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] battery2 git repository

2007-04-26 Thread Anton Vorontsov

On Thu, Apr 26, 2007 at 05:09:28PM -0700, Greg KH wrote:
> On Fri, Apr 27, 2007 at 03:29:02AM +0400, Anton Vorontsov wrote:
> > 
> > To not confuse with David own battery-2.6 repository, it's called
> > battery2-2.6, and can be found here:
> 
> So, I'm still not quite understanding this, is David giving up on his
> code now that you have pulled his changes into your tree?  Or is there
> some reason there are still two competing versions here?

[ Of course David will answer for himself better. Below is what I'm
  seeing. ]

David said he intend to port his OLPC and Apple PMU work to battery2
class implementation (and I volunteered to assist, i.e. port them
myself ;-). So, when these two drivers will be ported, then there
will no reasons for battery-2.6 existence anymore.

As for the competition... No competition took place at all, there was
two parallel works, each for two *independent* projects (OLPC and
handhelds.org). battery2 should be suitable for both.

David?

> thanks,
> 
> greg k-h

Sincerely,

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.org/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread Andrew Morton

On Tue, 24 Apr 2007 15:21:05 -0700 [EMAIL PROTECTED] wrote:

> This patchset modifies the Linux kernel so that larger block sizes than
> page size can be supported. Larger block sizes are handled by using
> compound pages of an arbitrary order for the page cache instead of
> single pages with order 0.

Something I was looking for but couldn't find: suppose an application takes
a pagefault against the third 4k page of an order-2 pagecache "page".  We
need to instantiate a pte against find_get_page(offset/4)+3.  But these
patches don't touch mm/memory.c at all and filemap_nopage() appears to
return the zeroeth 4k page all the time in that case.

So.. what am I missing, and how does that part work?

Also, afaict your important requirements would be met by retaining
PAGE_CACHE_SIZE=4k and simply ensuring that pagecache is populated by
physically contiguous pages - so instead of allocating and adding one 4k
page, we allocate an order-2 page and sprinkle all four page*'s into the
radix tree in one hit.  That should be fairly straightforward to do, and
could be made indistinguishably fast from doing a single 16k page for some
common pagecache operations (gang-insert, gang-lookup).

The BIO and block layers will do-the-right-thing with that pagecache and
you end up with four times more data in the SG lists, worst-case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] utimensat implementation

2007-04-26 Thread Neil Brown

On Thursday April 26, [EMAIL PROTECTED] wrote:
> The next revision of POSIX will support fine-grained filesystem
> timestamps the way we already support.  struct stat will report
> nanosecond values.  So far so good.  

Does it also specify how to find out what granularity is used by the
filesystem?  I had a need for this just recently and couldn't see any
way to extract it.

[If the mtime of a file matches the current time, then you cannot
cache the contents of the file.  You have to wait until the mtime is
in the past.  Without knowing the granularity, you cannot tell if the
mtime still matches current time or not]

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] cpufreq: allow full selection of default governors

2007-04-26 Thread Dominik Brodowski

On Tue, Apr 24, 2007 at 08:03:27PM -0400, Dave Jones wrote:
> On Tue, Apr 24, 2007 at 03:05:36PM -0700, Nish Aravamudan wrote:
>  > On 4/24/07, Dave Jones <[EMAIL PROTECTED]> wrote:
>  > > On Tue, Apr 24, 2007 at 09:03:23PM +, William Heimbigner wrote:
>  > >  > The following patches should allow selection of conservative, 
> powersave, and
>  > >  > ondemand in the kernel configuration.
>  > >
>  > > This has been rejected several times already.
>  > > Ondemand and conservative isn't a viable governor for all cpufreq
>  > > implementations (ie, ones with high switching latencies).
>  > 
>  > This piques my curiosity -- some governors don't work with some
>  > cpufreq implementations. Are those implementations in the kernel or in
>  > userspace? If in the kernel, then perhaps there should be some
>  > dependency expressed there in Kconfig between cpufreq implementation
>  > and the available governors
> 
> it can't be solved that easily. powernow-k8 for example is fine to
> use with ondemand on newer systems, where the latency is low.
> On older models however, it isn't.
> 
>  > > Also, see the
>  > > comment in the Kconfig a few lines above where you are adding this.
>  > 
>  > Are these governors unfixable? If
> 
> tbh, I've forgotten the original issues that caused the comment
> to be placed there. Dominik ?

Not unfixable, but: cpufreq is currently[*] built around the assumption that
at least one governor is correctly initialized or can be brought to work
when a CPU is registered with the cpufreq core.

Dominik


[*] That is, the last time I looked at it ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] change global zonelist order on NUMA v2

2007-04-26 Thread KAMEZAWA Hiroyuki

On Thu, 26 Apr 2007 18:25:10 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Fri, 27 Apr 2007, KAMEZAWA Hiroyuki wrote:
> 
> > > DMA memory.
> > > 
> > It seems a bit complicated. If we do so, following can occur,
> > 
> > Node1: cpu0,1,2,3
> > Node0: cpu4,5,6,7
> 
> We were discussing a two node NUMA system. If you have more put it onto 
> the last.
> 
Hmm, from technical point of view, renumbering may be an option.
But I feel that it's not natural look and different from user's expectation...

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/1] IBM PPC EMAC driver:improved support for PHYconfiguration

2007-04-26 Thread Benjamin Herrenschmidt

On Thu, 2007-04-26 at 17:28 -0700, Jeff Haran wrote:
> > -Original Message-
> > From: Benjamin Herrenschmidt [mailto:[EMAIL PROTECTED] 
> > Sent: Thursday, April 26, 2007 5:19 PM
> > To: Jeff Haran
> > Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH 1/1] IBM PPC EMAC driver:improved support 
> > for PHYconfiguration
> > 
> > On Thu, 2007-04-26 at 16:18 -0700, Jeff Haran wrote:
> > > From: Jeff Haran  <[EMAIL PROTECTED]>
> > > 
> > > This patch fixes some problems I found while debugging the IBM EMAC 
> 
> ...
> 
> > Your patch appears to have been line wrapped by your mailer though...
> 
> Benjamin,
> 
> Friggin Outlook and our M$ centered corporate IT environment. Even when
> you tell it to send plain text it modifies it. This was my first attempt
> at patch submission, so I don't know all the hoops yet.
> 
> I can try sending the patch again if I can figure out how to disable the
> line wrap. Or if you have some FTP server I can put to I can send you
> the patch file directly there.
> 
> Please let me know which you'd prefer.

In the worst case, just send the patch as a compressed attachment.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PROBLEM

2007-04-26 Thread Thomas Kayser

PCI transparent bridge

This report is FYI, filed because of the kernel message.  Using the kernel 
parameter seems to work fine.

On boot there is a message that says this:

Kernel:PCI: Transparent bridge - :00:1e.0
Kernel:PCI: Bus #04 (-#07) is hidden behind transparent bridge #03 (-#04) 
(try 'pci=assign-busses')

When I pass 'pci=assign-busses' to the kernel then the message is only this:

Kernel:PCI: Transparent bridge - :00:1e.0

Keywords: pci transparent bridge

[EMAIL PROTECTED]:/usr/src/linux-source-2.6.20$ cat /proc/version
Linux version 2.6.20-04-26-07 (2.6.20-1) ([EMAIL PROTECTED]) (gcc version 4.1.2 
20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Apr 26 04:15:26 CDT 2007

Scripts/ver_linux output:

Linux debian5 2.6.20-04-26-07 #1 SMP Thu Apr 26 04:15:26 CDT 2007 i686 
GNU/Linux

Gnu C  4.1.2
Gnu make   3.81
binutils   2.17
util-linux 2.12r
mount  2.12r
module-init-tools  3.3-pre2
e2fsprogs  1.40-WIP
pcmciautils014
pcmcia-cs  3.2.8
Linux C Library2.3.6
Dynamic linker (ldd)   2.3.6
Procps 3.2.7
Net-tools  1.60
Console-tools  0.2.3
Sh-utils   5.97
udev   105
wireless-tools 28
Modules Loaded rfcomm l2cap bluetooth nfsd exportfs nfs lockd nfs_acl 
sunrpc ppdev lp thermal fan button processor ac battery ipv6 sbp2 scsi_mod 
loop eth1394 pcmcia firmware_class snd_hda_intel snd_hda_codec snd_pcm_oss 
snd_mixer_oss snd_pcm snd_timer yenta_socket snd rsrc_nonstatic i2c_i801 
iTCO_wdt ide_cd soundcore snd_page_alloc cdrom pcspkr parport_pc parport 
uhci_hcd ehci_hcd i2c_core iTCO_vendor_support r8169 sdhci mmc_core ohci1394 
ieee1394 pcmcia_core intel_agp agpgart serio_raw rtc usbcore psmouse ext3 jbd 
mbcache ide_disk generic piix ide_core evdev

[EMAIL PROTECTED]:/usr/src/linux-source-2.6.20$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 CPU T5600  @ 1.83GHz
stepping: 6
cpu MHz : 1828.883
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm 
constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips: 3661.67
clflush size: 64

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Core(TM)2 CPU T5600  @ 1.83GHz
stepping: 6
cpu MHz : 1828.883
cache size  : 2048 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm 
constant_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips: 3657.78
clflush size: 64

[EMAIL PROTECTED]:/usr/src/linux-source-2.6.20$ cat /proc/modules
rfcomm 35288 0 - Live 0xf8d0d000
l2cap 22016 5 rfcomm, Live 0xf8c73000
bluetooth 48484 4 rfcomm,l2cap, Live 0xf8c8e000
nfsd 202672 17 - Live 0xf8d86000
exportfs 5568 1 nfsd, Live 0xf8c36000
nfs 219308 4 - Live 0xf8d4f000
lockd 58056 3 nfsd,nfs, Live 0xf8c7e000
nfs_acl 3520 2 nfsd,nfs, Live 0xf8c22000
sunrpc 148284 13 nfsd,nfs,lockd,nfs_acl, Live 0xf8ce7000
ppdev 8644 0 - Live 0xf8c67000
lp 11816 0 - Live 0xf8c41000
thermal 13832 0 - Live 0xf8c3c000
fan 4676 0 - Live 0xf8c2a000
button 7760 0 - Live 0xf8ae7000
processor 30008 1 thermal, Live 0xf8c2d000
ac 5124 0 - Live 0xf8af
battery 9860 0 - Live 0xf8aec000
ipv6 233184 14 - Live 0xf8cad000
sbp2 21380 0 - Live 0xf8c1b000
scsi_mod 132812 1 sbp2, Live 0xf8c45000
loop 15816 0 - Live 0xf8adc000
eth1394 18500 0 - Live 0xf8ae1000
pcmcia 35436 0 - Live 0xf8ac2000
firmware_class 9536 1 pcmcia, Live 0xf89fc000
snd_hda_intel 19608 0 - Live 0xf8a8e000
snd_hda_codec 151680 1 snd_hda_intel, Live 0xf8af3000
snd_pcm_oss 38944 0 - Live 0xf8ab7000
snd_mixer_oss 15424 1 snd_pcm_oss, Live 0xf89f7000
snd_pcm 71172 3 snd_hda_intel,snd_hda_codec,snd_pcm_oss, Live 0xf8aa4000
snd_timer 20612 1 snd_pcm, Live 0xf89f
yenta_socket 24844 1 - Live 0xf89e8000
snd 48036 6 
snd_hda_intel,snd_hda_codec,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer, Live 
0xf8a81000
rsrc_nonstatic 12096 1 yenta_socket, Live 0xf896c000
i2c_i801 7500 0 - Live 0xf897
iTCO_wdt 10596 0 - Live 0xf893c000
ide_cd 36576 0 - Live 0xf89de000
soundcore 7520 1 snd, Live

[PATCH -rt] Stop interrupt storm for fasteoi.

2007-04-26 Thread Steven Rostedt

Ingo,

I've spent several days banging my head on this bug, and I finally found
it. I originally thought we had a bug with the latency tracer, since it
seemed to only occur when I turned on latency tracing.  But I guess it
just changed the timings to cause the bug to happen. Now that I found
where the bug is, I don't know how it ever worked, even without the
tracing.

When taking an Ethernet interrupt, it was handled by handle_fasteoi_irq.

handle_fasteoi_irq, when the irq is handled by a thread, sets the irq
INPROGRESS and masks the irq.

Before leaving handle_fasteoi_irq, a call to desc->chip->eoi is called.

For the apic, this will call move_native_irq.

In the -rt kernel, move_native_irq masks the irq, moves it, and then
blindly unmasks it.  So when interrupts are turned on next, we take an
interrupt storm.

The other handlers besides handle_fasteoi_irq mask the irq regardless of
whether the irq is INPROGRESS.  But handle_fasteoi_irq will not mask it,
if the irq is already INPROGRESS, and just returns. So we keep taking
the same interrupt.

So, I change move_native_irq to not mask and unmask if the irq is
currently INPROGRESS.  But... I'm not sure if this is ok or not, since I
don't know all the uses to this.

My original patch was just to do a 

   +@@ -68,6 +68,9 @@ void move_native_irq(int irq)
 if (unlikely(desc->status & IRQ_DISABLED))
 return;

if (unlikely(desc->status & IRQ_INPROGRESS))
  +   return;
  +
 desc->chip->mask(irq);
 move_masked_irq(irq);
 desc->chip->unmask(irq);


But I thought that this might be too big of a hammer to this nail. So I
changed it to the patch below.


Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>

Index: linux-2.6.21-rt1-i386/kernel/irq/migration.c
===
--- linux-2.6.21-rt1-i386.orig/kernel/irq/migration.c
+++ linux-2.6.21-rt1-i386/kernel/irq/migration.c
@@ -61,6 +61,7 @@ void move_masked_irq(int irq)
 void move_native_irq(int irq)
 {
struct irq_desc *desc = irq_desc + irq;
+   int mask = 1;
 
if (likely(!(desc->status & IRQ_MOVE_PENDING)))
return;
@@ -68,8 +69,17 @@ void move_native_irq(int irq)
if (unlikely(desc->status & IRQ_DISABLED))
return;
 
-   desc->chip->mask(irq);
+   /*
+* If the irq is already in progress, it should be masked.
+* If we unmask it, we might cause an interrupt storm on RT.
+*/
+   if (unlikely(desc->status & IRQ_INPROGRESS))
+   mask = 0;
+
+   if (mask)
+   desc->chip->mask(irq);
move_masked_irq(irq);
-   desc->chip->unmask(irq);
+   if (mask)
+   desc->chip->unmask(irq);
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] drivers/net: move the nvidia forcedeth driver from 100M group to 1000M group

2007-04-26 Thread Peer Chen

You are right,both ways would cause confusion,but sooner or later we need to 
move it because our NICs onward are all Gigabit and 100M NICs will disappear 
gradually in the future. Probably H.Peter's suggestion that have a single list 
for 100M and 1000M is a better choice.

-Original Message-
From: Lennart Sorensen [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 25, 2007 10:06 PM
To: Peer Chen
Cc: linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [PATCH] drivers/net: move the nvidia forcedeth driver from 100M 
group to 1000M group

On Wed, Apr 25, 2007 at 01:30:04PM +0800, Peer Chen wrote:
> nForce ehternet is a Gigabit NIC not 100M, move it to 1000M group to 
> avoid the confusion.

The forcedeth on my nforce2 board is 100Mbit.  I think the driver handles both 
100Mbit and GBit type devices.  Makes for an interesting categorization 
problem.  Moving it would cause confusion.  Leaving it where it is could cause 
confusion.  Moving it may confuse existing users more though so I would 
recommend leaving it alone unless you can somehow make it appear in both.

--
Len Sorensen
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] change global zonelist order on NUMA v2

2007-04-26 Thread Christoph Lameter

On Fri, 27 Apr 2007, KAMEZAWA Hiroyuki wrote:

> > DMA memory.
> > 
> It seems a bit complicated. If we do so, following can occur,
> 
> Node1: cpu0,1,2,3
> Node0: cpu4,5,6,7

We were discussing a two node NUMA system. If you have more put it onto 
the last.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] iop13xx: msi support (rev6)

2007-04-26 Thread Dan Williams

From: Daniel Wolstenholme <[EMAIL PROTECTED]>

Enable devices to signal interrupts via PCI memory cycles.

rev6:
* fix enable/disable typo, Michael Ellerman

rev5:
* fix up ack, enable, and disable for iop13xx_msi_chip

rev4:
* move smp compile fix to separate patch
* use dynamic_irq_init in create_irq()
* hookup mask/unmask routines in iop13xx_msi_chip

rev3:
* change msi.c to use linux/smp.h instead of asm/smp.h
* call dynamic_irq_cleanup at destroy_irq time

rev2:
* destroy_irq did not take the full 128 bits of msi_irq_in_use into account
* added missing '&' for calls to test_and_set_bit and clear_bit

[EMAIL PROTECTED]: review comments/suggestions]
[EMAIL PROTECTED]: cleanups/forward port to 2.6-git]
Cc: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Daniel Wolstenholme <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>
---

 arch/arm/mach-iop13xx/Makefile |1 
 arch/arm/mach-iop13xx/irq.c|5 +
 arch/arm/mach-iop13xx/msi.c|  194 
 arch/arm/mach-iop13xx/pci.c|   16 +++
 include/asm-arm/arch-iop13xx/iop13xx.h |   29 +
 include/asm-arm/arch-iop13xx/irqs.h|8 +
 include/asm-arm/arch-iop13xx/msi.h |   11 ++
 7 files changed, 262 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-iop13xx/Makefile b/arch/arm/mach-iop13xx/Makefile
index 4185e05..02bd511 100644
--- a/arch/arm/mach-iop13xx/Makefile
+++ b/arch/arm/mach-iop13xx/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_ARCH_IOP13XX) += pci.o
 obj-$(CONFIG_ARCH_IOP13XX) += io.o
 obj-$(CONFIG_MACH_IQ81340SC) += iq81340sc.o
 obj-$(CONFIG_MACH_IQ81340MC) += iq81340mc.o
+obj-$(CONFIG_PCI_MSI) += msi.o
diff --git a/arch/arm/mach-iop13xx/irq.c b/arch/arm/mach-iop13xx/irq.c
index b2eb0b9..5791add 100644
--- a/arch/arm/mach-iop13xx/irq.c
+++ b/arch/arm/mach-iop13xx/irq.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* INTCTL0 CP6 R0 Page 4
  */
@@ -258,7 +259,7 @@ void __init iop13xx_init_irq(void)
write_intbase(INTBASE);
write_intsize(INTSIZE_4);
 
-   for(i = 0; i < NR_IOP13XX_IRQS; i++) {
+   for(i = 0; i <= IRQ_IOP13XX_HPI; i++) {
if (i < 32)
set_irq_chip(i, _irqchip1);
else if (i < 64)
@@ -271,4 +272,6 @@ void __init iop13xx_init_irq(void)
set_irq_handler(i, handle_level_irq);
set_irq_flags(i, IRQF_VALID | IRQF_PROBE);
}
+
+   iop13xx_msi_init();
 }
diff --git a/arch/arm/mach-iop13xx/msi.c b/arch/arm/mach-iop13xx/msi.c
new file mode 100644
index 000..1b64eaa
--- /dev/null
+++ b/arch/arm/mach-iop13xx/msi.c
@@ -0,0 +1,194 @@
+/*
+ * arch/arm/mach-iop13xx/msi.c
+ *
+ * PCI MSI support for the iop13xx processor
+ *
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+
+
+#define IOP13XX_NUM_MSI_IRQS 128
+static DECLARE_BITMAP(msi_irq_in_use, IOP13XX_NUM_MSI_IRQS);
+
+/* IMIPR0 CP6 R8 Page 1
+ */
+static inline u32 read_imipr_0(void)
+{
+   u32 val;
+   asm volatile("mrc p6, 0, %0, c8, c1, 0":"=r" (val));
+   return val;
+}
+static inline void write_imipr_0(u32 val)
+{
+   asm volatile("mcr p6, 0, %0, c8, c1, 0"::"r" (val));
+}
+
+/* IMIPR1 CP6 R9 Page 1
+ */
+static inline u32 read_imipr_1(void)
+{
+   u32 val;
+   asm volatile("mrc p6, 0, %0, c9, c1, 0":"=r" (val));
+   return val;
+}
+static inline void write_imipr_1(u32 val)
+{
+   asm volatile("mcr p6, 0, %0, c9, c1, 0"::"r" (val));
+}
+
+/* IMIPR2 CP6 R10 Page 1
+ */
+static inline u32 read_imipr_2(void)
+{
+   u32 val;
+   asm volatile("mrc p6, 0, %0, c10, c1, 0":"=r" (val));
+   return val;
+}
+static inline void write_imipr_2(u32 val)
+{
+   asm volatile("mcr p6, 0, %0, c10, c1, 0"::"r" (val));
+}
+
+/* IMIPR3 CP6 R11 Page 1
+ */
+static inline u32 read_imipr_3(void)
+{
+   u32 val;
+   asm volatile("mrc p6, 0, %0, c11, c1, 0":"=r" (val));
+   return val;
+}
+static inline void write_imipr_3(u32 val)
+{
+   asm volatile("mcr p6, 0, %0, c11, c1, 0"::"r" (val));
+}
+
+static u32 (*read_imipr[])(void) = {
+   read_imipr_0,
+   read_imipr_1,
+   read_imipr_2,
+   read_imipr_3,
+};
+
+static void (*write_imipr[])(u32) = {
+

[PATCH 1/2] msi: fix ARM compile

2007-04-26 Thread Dan Williams

In file included from drivers/pci/msi.c:22:
include/asm/smp.h:17:26: asm/arch/smp.h: No such file or directory
include/asm/smp.h:20:3: #error " included in non-SMP build"
include/asm/smp.h:23:1: warning: "raw_smp_processor_id" redefined
In file included from include/linux/sched.h:65,
 from include/linux/mm.h:4,
 from drivers/pci/msi.c:10:
include/linux/smp.h:85:1: warning: this is the location of the previous
definition

Tested on powerpc, i386, and x86_64.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-by: Eric W. Biederman <[EMAIL PROTECTED]>
---

 drivers/pci/msi.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 435c195..c642e3b 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -16,10 +16,10 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
-#include 
 
 #include "pci.h"
 #include "msi.h"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1] IBM PPC EMAC driver:improved support for PHY, resending

2007-04-26 Thread Jeff Haran

From: Jeff Haran  <[EMAIL PROTECTED]>

Resending with Outlook patch mangling hopefully corrected (Maybe I
should write a HOWTO, this was harder than fixing the driver).
This patch fixes some problems I found while debugging the IBM EMAC
driver for PPC32 systems.
The first problem was in the function that configures the PHY for
autonegotiation, genmii_setup_aneg(). The original code does a
read/modify/write of the autonegotiation advertizement register (reg 4),
followed by a read/modify/write of the control register (reg 0). While
the original code follows the proper procedure as per reading the IEEE
specs, what I found is that on at least one PHY model (National DP83843)
the read of the control register comes back with the soft reset bit set
(bit 15). Because of the read/modify/write operation, this causes the
write to write a 1 back to the reset bit, which initiates a software
reset of the PHY. This software reset causes the PHY to return to its
power up state which advertizes all modes of operation, thus negating
the write to the autoneg advertizement register. The modification is to
spin reading the control register until the soft reset bit is clear
before doing the modify/write. I guess this bit is in reality more of
the "device busy" bit on at least some PHYs.
The second problem was in the function that configures the PHY for
forced operation, genmii_setup_forced(). The original code initiates a
software reset operation via a write of a 1 to bit 15 of the control
register (reg 0), but then proceeds to do a second write to that same
register without waiting until that reset bit is cleared by the PHY
itself (which according to the IEEE specs indicates that the PHY reset
is complete). This is a violation of how one is supposed to use this
software reset feature of these PHYs and I believe was the cause of
mysterious, difficult to reproduce link failures that we've observed on
some of our systems that use this driver. The fix is to modify the
function so that it spins waiting for the reset bit to clear after doing
the soft reset and before doing the subsequent write. Since this
modification, we haven't seen the mysterious link failures, though they
were so rare its difficult to say at this point whether this was the
cause.
I also added some error handling and reporting for the abnormal case
where the reset bit never clears from the soft reset operation.
Applied to kernel version 2.6.21.
Signed-off-by: Jeff Haran <[EMAIL PROTECTED]>
---
--- linux-2.6.21/drivers/net/ibm_emac/ibm_emac_phy.c.orig
2007-04-25 20:08:32.0 -0700
+++ linux-2.6.21/drivers/net/ibm_emac/ibm_emac_phy.c2007-04-26
14:42:09.562996000 -0700
@@ -22,8 +22,12 @@
 
 #include 
 
+#include "ibm_emac_core.h"
 #include "ibm_emac_phy.h"
 
+#define NL "\n"
+#define PHY_DBG(f,x...) printk("emac" f, ##x)
+
 static inline int phy_read(struct mii_phy *phy, int reg)
 {
return phy->mdio_read(phy->dev, phy->address, reg);
@@ -34,11 +38,34 @@ static inline void phy_write(struct mii_
phy->mdio_write(phy->dev, phy->address, reg, val);
 }
 
-int mii_reset_phy(struct mii_phy *phy)
+/*
+ * polls MII_BMCR until BMCR_RESET bit clears or operation times out.
+ *
+ * returns:
+ * >= 0 => success, value in BMCR returned to caller
+ * -EBUSY => failure, RESET bit never cleared
+ * otherwise => failure, lower level PHY read failed
+ */
+
+static int mii_spin_reset_complete(struct mii_phy *phy)
 {
int val;
int limit = 1;
 
+   while (limit--) {
+   val = phy_read(phy, MII_BMCR);
+   if ((val >= 0) && ((val & BMCR_RESET) == 0))
+   return val; /* success */
+   udelay(10);
+   }
+
+   return (val < 0) ? val : -EBUSY;
+}
+
+int mii_reset_phy(struct mii_phy *phy)
+{
+   int val;
+
val = phy_read(phy, MII_BMCR);
val &= ~BMCR_ISOLATE;
val |= BMCR_RESET;
@@ -46,16 +73,17 @@ int mii_reset_phy(struct mii_phy *phy)
 
udelay(300);
 
-   while (limit--) {
-   val = phy_read(phy, MII_BMCR);
-   if (val >= 0 && (val & BMCR_RESET) == 0)
-   break;
-   udelay(10);
+   val = mii_spin_reset_complete(phy);
+
+   if (val < 0) {
+   PHY_DBG("%d: reset_complete failed in reset %d" NL,
+   ((struct ocp_enet_private *)
(phy->dev->priv))->def->index, val);
+   } else {
+   if (val & BMCR_ISOLATE)
+   phy_write(phy, MII_BMCR, val & ~BMCR_ISOLATE);
}
-   if ((val & BMCR_ISOLATE) && limit > 0)
-   phy_write(phy, MII_BMCR, val & ~BMCR_ISOLATE);
 
-   return limit <= 0;
+   return val < 0;
 }
 
 static int genmii_setup_aneg(struct mii_phy *phy, u32 advertise)
@@ -102,9 +130,18 @@ static int genmii_setup_aneg(struct mii_
}
 
/* Start/Restart aneg */
-   ctl = phy_read(phy, MII_BMCR);
-   ctl |= (BMCR_ANENABLE | BMCR_ANRESTART);
-

Pin an IRQ to a particular CPU

2007-04-26 Thread Phy Prabab


Hello,

In an effort to increase over all throughput of my Linux NFS file
server, I thought about trying to force an IRQ, for the NIC, to be
serviced by a particular CPU.  Is this possible?

TIA,
Phy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-2.6.21-rc7-mm2 fails to compile

2007-04-26 Thread Randy Dunlap

On Thu, 26 Apr 2007 17:54:20 -0700 Andrew Morton wrote:

> On Fri, 27 Apr 2007 00:00:15 + (GMT) William Heimbigner <[EMAIL 
> PROTECTED]> wrote:
> 
> > Output leading up to the error:
> > 
> >CC  drivers/macintosh/macio-adb.o
> >LD  drivers/macintosh/built-in.o
> >CC [M]  drivers/macintosh/apm_emu.o
> >CC [M]  drivers/macintosh/therm_windtunnel.o
> > drivers/macintosh/therm_windtunnel.c: In function 'therm_of_remove':
> > drivers/macintosh/therm_windtunnel.c:462: error: void value not ignored as 
> > it 
> > ought to be
> > drivers/macintosh/therm_windtunnel.c:463: warning: control reaches end of 
> > non-void function
> > make[2]: *** [drivers/macintosh/therm_windtunnel.o] Error 1
> > make[1]: *** [drivers/macintosh] Error 2
> > make: *** [drivers] Error 2
> > 
> > This is on an iMac G3 powerpc.
> 
> i2c_del_driver() was changed to reutrn void in the i2c tree.
> 
> --- 
> a/drivers/macintosh/therm_windtunnel.c~fix-jdelvare-i2c-i2c-del-driver-returns-void-on-powerpc
> +++ a/drivers/macintosh/therm_windtunnel.c
> @@ -459,7 +459,8 @@ therm_of_probe( struct of_device *dev, c
>  static int
>  therm_of_remove( struct of_device *dev )
>  {
> - return i2c_del_driver( _driver );
> + i2c_del_driver( _driver );
> + return 0;
>  }
>  
>  static struct of_device_id therm_of_match[] = {{


This will fail on the "ugly parentheses" test...  :)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: MAINTAINERS file out of date?

2007-04-26 Thread H. Peter Anvin

WANG Cong wrote:
>>
>> I have considered myself as a rather unofficial maintainer of this code,
>> and wouldn't mind make it official now when I actually have a job which
>> both cares about and actually can support my upstream Linux activities,
>> which was a major pain for a while.
>>
>> Overall, there is a lot of cleanup which really is needed in the i386
>> boot process; I have done some work on it already, but more is needed.
> 
> Sounds interesting. Can you point me what needs to be done exactly? Maybe I 
> can help you. ;)

There was a long thread on the linux-virtualization list
(@lists.linux-foundation.org) just a few days ago.  The biggest single
issue right now is probably how we transition from the bootup page
tables to the "real" init_mm page tables, but the real-mode code also
needs a massive overhaul (especially so since this code can and should
be shared with x86-64); in particular I really want to get rid of the
insane segment handling, where segments are constantly recalculated for
no good reason.

For the real-mode stuff, I have some patches already in the works for
this.  Eric W. Biederman has also done a lot of work in this area.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] utimensat implementation

2007-04-26 Thread Ulrich Drepper

H. Peter Anvin wrote:
> Primarily to let a backup program restore the full state of the filesystem.

Is this wanted?  Or needed?  I would think there are good reasons why
this hasn't been done so far.  Intrusion detection is one reason I can
think of.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature

Re: MAINTAINERS file out of date?

2007-04-26 Thread WANG Cong

On Thu, Apr 26, 2007 at 04:13:01PM -0700, H. Peter Anvin wrote:
>Michael McConnell wrote:
>> Hi folks,
>> 
>> I noticed the MAINTAINERS file still lists Riley Williams as the
>> maintainer of the i386 boot code, presumably as no-one else has taken it
>> up in his absence (though, I'm sure it's probably been touched since
>> late 2004).
>> 
>> If I knew the first thing about the i386 boot process I would volunteer,
>> but sadly I don't
>> 
>> This article below details the reason for his absence, but please make
>> sure you're sitting down before clicking. (It is work-safe.)
>> http://www.chorleytoday.co.uk/ViewArticle2.aspx?SectionID=81=905248
>> 
>
>Dear.
>
>I have considered myself as a rather unofficial maintainer of this code,
>and wouldn't mind make it official now when I actually have a job which
>both cares about and actually can support my upstream Linux activities,
>which was a major pain for a while.
>
>Overall, there is a lot of cleanup which really is needed in the i386
>boot process; I have done some work on it already, but more is needed.
>

Sounds interesting. Can you point me what needs to be done exactly? Maybe I can 
help you. ;)

Have fun!



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] ia64: convert to use clocksource code

2007-04-26 Thread Chris Wright

* Peter Keilty ([EMAIL PROTECTED]) wrote:
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 6077300..35ad71f 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -480,10 +480,12 @@ #endif
>   /* Get end time (ticks) */
>   t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>  
> +#ifndef CONFIG_IA64
>  #ifdef CONFIG_GENERIC_TIME
>   /* TSC halts in C2, so notify users */
>   mark_tsc_unstable();
>  #endif
> +#endif

Is this a better description of the dependency?

  #if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86_TSC)

>   /* Re-enable interrupts */
>   local_irq_enable();
>   current_thread_info()->status |= TS_POLLING;
> @@ -522,10 +524,12 @@ #endif
>   acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0);
>   }
>  
> +#ifndef CONFIG_IA64
>  #ifdef CONFIG_GENERIC_TIME
>   /* TSC halts in C3, so notify users */
>   mark_tsc_unstable();
>  #endif
> +#endif

ditto

>   /* Re-enable interrupts */
>   local_irq_enable();
>   current_thread_info()->status |= TS_POLLING;
> diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
> index 0be700f..5ea7d3e 100644
> --- a/drivers/char/hpet.c
> +++ b/drivers/char/hpet.c
> @@ -29,6 +29,7 @@ #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -51,8 +52,34 @@ #defineHPET_DRIFT  (500)
>  
>  #define HPET_RANGE_SIZE  1024/* from HPET spec */
>  
> +#if BITS_PER_LONG == 64
> +#define  write_counter(V, MC)writeq(V, MC)
> +#define  read_counter(MC)readq(MC)
> +#else
> +#define  write_counter(V, MC)writel(V, MC)
> +#define  read_counter(MC)readl(MC)
> +#endif
> +
>  static u32 hpet_nhpet, hpet_max_freq = HPET_USER_FREQ;
>  
> +static void __iomem *hpet_mc_ptr;

CodingStyle nit: we don't need all this _ptr...

> +static cycle_t read_hpet(void)
> +{
> + return (cycle_t)read_counter((void __iomem *)hpet_mc_ptr);
> +}
> +
> +static struct clocksource clocksource_hpet = {
> +.name   = "hpet",
> +.rating = 300,
> +.read   = read_hpet,
> +.mask   = 0xLL,
> +.mult   = 0, /*to be caluclated*/
> +.shift  = 10,
> +.is_continuous  = 1,
> +};
> +static struct clocksource *hpet_clocksource_p;

and _p naming.

> +
>  /* A lock for concurrent access by app and isr hpet activity. */
>  static DEFINE_SPINLOCK(hpet_lock);
>  /* A lock for concurrent intermodule access to hpet and isr hpet activity. */
> @@ -79,7 +106,7 @@ struct hpets {
>   struct hpets *hp_next;
>   struct hpet __iomem *hp_hpet;
>   unsigned long hp_hpet_phys;
> - struct time_interpolator *hp_interpolator;
> + struct clocksource *hp_clocksource;
>   unsigned long long hp_tick_freq;
>   unsigned long hp_delta;
>   unsigned int hp_ntimer;
> @@ -94,13 +121,6 @@ #define   HPET_IE 0x0002  /* interrupt en
>  #define  HPET_PERIODIC   0x0004
>  #define  HPET_SHARED_IRQ 0x0008
>  
> -#if BITS_PER_LONG == 64
> -#define  write_counter(V, MC)writeq(V, MC)
> -#define  read_counter(MC)readq(MC)
> -#else
> -#define  write_counter(V, MC)writel(V, MC)
> -#define  read_counter(MC)readl(MC)
> -#endif
>  
>  #ifndef readq
>  static inline unsigned long long readq(void __iomem *addr)
> @@ -737,27 +757,6 @@ static ctl_table dev_root[] = {
>  
>  static struct ctl_table_header *sysctl_header;
>  
> -static void hpet_register_interpolator(struct hpets *hpetp)
> -{
> -#ifdef   CONFIG_TIME_INTERPOLATION
> - struct time_interpolator *ti;
> -
> - ti = kzalloc(sizeof(*ti), GFP_KERNEL);
> - if (!ti)
> - return;
> -
> - ti->source = TIME_SOURCE_MMIO64;
> - ti->shift = 10;
> - ti->addr = >hp_hpet->hpet_mc;
> - ti->frequency = hpetp->hp_tick_freq;
> - ti->drift = HPET_DRIFT;
> - ti->mask = -1;
> -
> - hpetp->hp_interpolator = ti;
> - register_time_interpolator(ti);
> -#endif
> -}
> -
>  /*
>   * Adjustment for when arming the timer with
>   * initial conditions.  That is, main counter
> @@ -909,7 +908,14 @@ int hpet_alloc(struct hpet_data *hdp)
>   }
>  
>   hpetp->hp_delta = hpet_calibrate(hpetp);
> - hpet_register_interpolator(hpetp);
> +
> + if (!hpet_clocksource_p) {
> + clocksource_hpet.fsys_mmio_ptr = hpet_mc_ptr = 
> >hp_hpet->hpet_mc;
> + clocksource_hpet.mult = clocksource_hz2mult(hpetp->hp_tick_freq,
> + clocksource_hpet.shift);
> + clocksource_register(_hpet);
> + hpet_clocksource_p = hpetp->hp_clocksource = _hpet;
> + }

This looks like a change in behaviour for non-ia64.  Now i386 and x86_64
will

Re: [patch] CFS scheduler, -v6

2007-04-26 Thread Gene Heskett

On Thursday 26 April 2007, Con Kolivas wrote:
>On Friday 27 April 2007 10:39, Gene Heskett wrote:
>> Not necessarily Con.  Do you have a fresh one for 2.6.21?
>
>Since people get nervous about any rejects here is an (otherwise identical)
>patch for 2.6.21
>
>http://ck.kolivas.org/patches/staircase-deadline/2.6.21-sd-0.46.patch

Thanks, I'll try to give it some cpu time in the next day.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Our informal mission is to improve the love life of operators worldwide.
-- Peter Behrendt, president of Exabyte
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86_64: dynamic MCE poll interval

2007-04-26 Thread Tim Hockin


From: Tim Hockin <[EMAIL PROTECTED]>

Background:
 We've found that MCEs (specifically DRAM SBEs) tend to come in bunches,
 especially when we are trying really hard to stress the system out.  The
 current MCE poller uses a static interval which does not care whether it
 has or has not found MCEs recently.

Description:
 This patch makes the MCE poller adjust the polling interval dynamically.
 If we find an MCE, poll 2x faster (down to 10 ms).  When we stop finding
 MCEs, poll 2x slower (up to check_interval seconds).  The check_interval
 tunable becomes the max polling interval.

Result:
 If you start to take a lot of correctable errors (not exceptions), you
 log them faster and more accurately (less chance of overflowing the MCA
 registers).  If you don't take a lot of errors, you will see no change.

Alternatives:
 I considered simply reducing the polling interval to 10 ms immediately
 and keeping it there as long as we continue to find errors.  This felt a
 bit heavy handed, but does perform significantly better for the default
 check_interval of 5 minutes (we're using a few seconds when testing for
 DRAM errors).

Patch:
 This patch is against 2.6.21-rc7.

Signed-Off-By: Tim Hockin <[EMAIL PROTECTED]>o

---

diff -pruN linux-2.6.20/arch/x86_64/kernel/mce.c
linux-2.6.20+th/arch/x86_64/kernel/mce.c
--- linux-2.6.20/arch/x86_64/kernel/mce.c   2007-04-24 23:36:04.0 
-0700
+++ linux-2.6.20+th/arch/x86_64/kernel/mce.c2007-04-26 10:40:29.0 
-0700
@@ -327,6 +327,7 @@ void mce_log_therm_throt_event(unsigned
 */

static int check_interval = 5 * 60; /* 5 minutes */
+static int next_interval; /* in jiffies */
static void mcheck_timer(struct work_struct *work);
static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);

@@ -339,7 +340,6 @@ static void mcheck_check_cpu(void *info)
static void mcheck_timer(struct work_struct *work)
{
on_each_cpu(mcheck_check_cpu, NULL, 1, 1);
-   schedule_delayed_work(_work, check_interval * HZ);

/*
 * It's ok to read stale data here for notify_user and
@@ -349,17 +349,24 @@ static void mcheck_timer(struct work_str
 * writes.
 */
if (notify_user && console_logged) {
+   /* if we logged an MCE, reduce the polling interval */
+   next_interval = max(next_interval/2, HZ/100);
notify_user = 0;
clear_bit(0, _logged);
printk(KERN_INFO "Machine check events logged\n");
+   } else {
+   next_interval = min(next_interval*2, check_interval*HZ);
}
+
+   schedule_delayed_work(_work, next_interval);
}


static __init int periodic_mcheck_init(void)
{
-   if (check_interval)
-   schedule_delayed_work(_work, check_interval*HZ);
+   next_interval = check_interval * HZ;
+   if (next_interval)
+   schedule_delayed_work(_work, next_interval);
return 0;
}
__initcall(periodic_mcheck_init);
@@ -597,12 +604,13 @@ static int mce_resume(struct sys_device
/* Reinit MCEs after user configuration changes */
static void mce_restart(void)
{
-   if (check_interval)
+   if (next_interval)
cancel_delayed_work(_work);
/* Timer race is harmless here */
on_each_cpu(mce_init, NULL, 1, 1);
-   if (check_interval)
-   schedule_delayed_work(_work, check_interval*HZ);
+   next_interval = check_interval * HZ;
+   if (next_interval)
+   schedule_delayed_work(_work, next_interval);
}

static struct sysdev_class mce_sysclass = {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] v2: utimensat implementation

2007-04-26 Thread Ulrich Drepper

Oops, little bug, I hooked-up the wrong syscall for the IA32 compat
code.  Fixed in this revision.

Signed-off-by: Ulrich Drepper <[EMAIL PROTECTED]>

diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index 796df69..12611c8 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -714,9 +714,10 @@ ia32_sys_call_table:
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
.quad sys_epoll_pwait
+   .quad compat_sys_utimensat  /* 320 */
 ia32_syscall_end:  
diff --git a/fs/compat.c b/fs/compat.c
index 040a8be..1644cd1 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -79,28 +79,55 @@ int compat_printk(const char *fmt, ...)
  */
 asmlinkage long compat_sys_utime(char __user *filename, struct compat_utimbuf 
__user *t)
 {
-   struct timeval tv[2];
+   struct timespec tv[2];
 
if (t) {
if (get_user(tv[0].tv_sec, >actime) ||
get_user(tv[1].tv_sec, >modtime))
return -EFAULT;
-   tv[0].tv_usec = 0;
-   tv[1].tv_usec = 0;
+   tv[0].tv_nsec = 0;
+   tv[1].tv_nsec = 0;
}
return do_utimes(AT_FDCWD, filename, t ? tv : NULL);
 }
 
+asmlinkage long compat_sys_utimensat(unsigned int dfd, char __user *filename, 
struct compat_timespec __user *t)
+{
+   struct timespec tv[2];
+
+   if  (t) {
+   if (get_compat_timespec([0], [0]) ||
+   get_compat_timespec([1], [1]))
+   return -EFAULT;
+
+   if ((tv[0].tv_nsec == UTIME_OMIT || tv[0].tv_nsec == UTIME_NOW)
+   && tv[0].tv_sec != 0)
+   return -EINVAL;
+   if ((tv[1].tv_nsec == UTIME_OMIT || tv[1].tv_nsec == UTIME_NOW)
+   && tv[1].tv_sec != 0)
+   return -EINVAL;
+
+   if (tv[0].tv_nsec == UTIME_OMIT && tv[1].tv_nsec == UTIME_OMIT)
+   return 0;
+   }
+   return do_utimes(dfd, filename, t ? tv : NULL);
+}
+
 asmlinkage long compat_sys_futimesat(unsigned int dfd, char __user *filename, 
struct compat_timeval __user *t)
 {
-   struct timeval tv[2];
+   struct timespec tv[2];
 
if (t) {
if (get_user(tv[0].tv_sec, [0].tv_sec) ||
-   get_user(tv[0].tv_usec, [0].tv_usec) ||
+   get_user(tv[0].tv_nsec, [0].tv_usec) ||
get_user(tv[1].tv_sec, [1].tv_sec) ||
-   get_user(tv[1].tv_usec, [1].tv_usec))
+   get_user(tv[1].tv_nsec, [1].tv_usec))
return -EFAULT;
+   if (tv[0].tv_nsec > LONG_MAX / 1000
+   || tv[1].tv_nsec > LONG_MAX / 1000)
+   return -EINVAL;
+   tv[0].tv_nsec *= 1000;
+   tv[1].tv_nsec *= 1000;
}
return do_utimes(dfd, filename, t ? tv : NULL);
 }
diff --git a/fs/utimes.c b/fs/utimes.c
index 99cf2cb..dc6612e 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ out:
  * must be owner or have write permission.
  * Else, update from *times, must be owner or super user.
  */
-long do_utimes(int dfd, char __user *filename, struct timeval *times)
+long do_utimes(int dfd, char __user *filename, struct timespec *times)
 {
int error;
struct nameidata nd;
@@ -100,11 +101,21 @@ long do_utimes(int dfd, char __user *filename, struct 
timeval *times)
 if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
 goto dput_and_out;
 
-   newattrs.ia_atime.tv_sec = times[0].tv_sec;
-   newattrs.ia_atime.tv_nsec = times[0].tv_usec * 1000;
-   newattrs.ia_mtime.tv_sec = times[1].tv_sec;
-   newattrs.ia_mtime.tv_nsec = times[1].tv_usec * 1000;
-   newattrs.ia_valid |= ATTR_ATIME_SET | ATTR_MTIME_SET;
+   if (times[0].tv_nsec == UTIME_OMIT)
+   newattrs.ia_valid &= ~ATTR_ATIME;
+   else if (times[0].tv_nsec != UTIME_NOW) {
+   newattrs.ia_atime.tv_sec = times[0].tv_sec;
+   newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
+   newattrs.ia_valid |= ATTR_ATIME_SET;
+   }
+
+   if (times[1].tv_nsec == UTIME_OMIT)
+   newattrs.ia_valid &= ~ATTR_MTIME;
+   else if (times[1].tv_nsec != UTIME_NOW) {
+   newattrs.ia_mtime.tv_sec = times[1].tv_sec;
+   newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
+   newattrs.ia_valid |= ATTR_MTIME_SET;
+   }
}

Re: [patch] CFS scheduler, -v6

2007-04-26 Thread Con Kolivas

On Friday 27 April 2007 10:39, Gene Heskett wrote:
> Not necessarily Con.  Do you have a fresh one for 2.6.21?

Since people get nervous about any rejects here is an (otherwise identical) 
patch for 2.6.21

http://ck.kolivas.org/patches/staircase-deadline/2.6.21-sd-0.46.patch

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] utimensat implementation

2007-04-26 Thread H. Peter Anvin

Ulrich Drepper wrote:
> H. Peter Anvin wrote:
>> I'm a bit leery of abusing the timespec value like this, though.  A
>> flags field seem like it would be cleaner.
> 
> It's ugly.  Then you have the parameter, which might have nice valid
> values, and they get ignored.  I thought about it when this was
> discussed in the working group and thought it's a toss up.

It's pretty ugly either way :-/

>> Something else... if we're dickering with these interfaces, shouldn't we
>> allow setting atime as well?
> 
> Why?  To allow somebody to hide her/his tracks?

Primarily to let a backup program restore the full state of the filesystem.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-2.6.21 and __udivid3

2007-04-26 Thread l l


It attachs,

Here is another compilation failure.

make
 CHK include/linux/version.h
 CHK include/linux/utsrelease.h
 CHK include/linux/compile.h
 CC [M]  drivers/w1/w1.o
drivers/w1/w1.c: In function 'w1_slave_read_id':
drivers/w1/w1.c:118: error: cannot take address of bit-field 'family'
drivers/w1/w1.c:118: error: cannot take address of bit-field 'family'
drivers/w1/w1.c:118: error: cannot take address of bit-field 'family'
drivers/w1/w1.c:118: error: cannot take address of bit-field 'family'
drivers/w1/w1.c:118: error: cannot take address of bit-field 'family'
drivers/w1/w1.c:118: error: cannot take address of bit-field 'family'
make[2]: *** [drivers/w1/w1.o] Error 1
make[1]: *** [drivers/w1] Error 2
make: *** [drivers] Error 2




I don't, i think i have to go back to gcc-4.2.0 which was fine with
linux-2.6.21-rc7.
It will be same to 2.6.21.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: W1 printk format warning

2007-04-26 Thread Williams, Dan J

> From: Andrew Morton [mailto:[EMAIL PROTECTED]
> On Thu, 26 Apr 2007 22:17:44 +0400 Evgeniy Polyakov
<[EMAIL PROTECTED]>
> wrote:
> 
> > On Wed, Apr 25, 2007 at 04:45:09PM -0700, Andrew Morton ([EMAIL PROTECTED]
> foundation.org) wrote:
> > > On Wed, 25 Apr 2007 16:21:04 -0700 Randy Dunlap
<[EMAIL PROTECTED]>
> wrote:
> > >
> > > > in 2.6.21-rc7-mm1.  Are you aware of this?
> > > >
> > > > drivers/w1/w1.c:460: warning: too few arguments for format
> > > >
> > > > dev_dbg(>dev, "%s: registering %s as %p.\n",
__func__,
> > > > >dev.bus_id[0]);
> > > >
> > >
> > > Yeah, that's because Dan's dev_dbg-check-dev_dbg-arguments.patch
added
> > > printk arg-checking to dev_dbg() and a bunch of bugs got exposed.
I fixed
> > > a few of them.
> >
> > Could you post a full list for w1, if there is only one, here is a
fix.
> >
> 
> There's only the one in w1, I think.  I fixed a few more but I didn't
> actually go looking for more such errors.

Feel free to bug me if there is more fallout that is not otherwise
picked up...

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] utimensat implementation

2007-04-26 Thread Ulrich Drepper

H. Peter Anvin wrote:
> I'm a bit leery of abusing the timespec value like this, though.  A
> flags field seem like it would be cleaner.

It's ugly.  Then you have the parameter, which might have nice valid
values, and they get ignored.  I thought about it when this was
discussed in the working group and thought it's a toss up.

> Something else... if we're dickering with these interfaces, shouldn't we
> allow setting atime as well?

Why?  To allow somebody to hide her/his tracks?

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

signature.asc
Description: OpenPGP digital signature

Re: [PATCH] utimensat implementation

2007-04-26 Thread Ulrich Drepper

Andrew Morton wrote:
> Does the spec say what the OS should do if (ts_nsec => 1e9)?

Yes, return EINVAL.  We already do this.  It's just that now we have to
recognize two special values.


> OK, so there's no collision on ts_nsec if unnormalised timespecs are
> disallowed.

Indeed, that's the basis of using the special values.

I chose the values of the constants so that they are a) out of the way
of valid values and b) don't have to be adjusted for 32-bit compat code.


> But there's a potential collision on ts_sec?  Do we know what date that
> corresponds to?

No, there is no collision.  The tv_sec value is relevant.  The
UTIME_OMIT and UTIME_NOW value refers to the atime/mtime respectively,
not just the tv_nsec field of either.  It makes no sense to just set
tv_sec, the tv_nsec value would be basically random.

In my patch I'm testing that tv_sec is zero in case any of the special
values is used in the corresponding tv_nsec field.  That's more than the
standard currently requires but I think it's better and I try to get the
standard proposal changed.  If this doesn't happen I'll make appropriate
changes at userlevel for the "strictly POSIX" mode.


> Do you have a testcase app which can be used by arch maintainers?

Attached here.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
#include 
#include 
#include 
#include 
#include 
#include 

#define UTIME_NOW   ((1l << 30) - 1l)
#define UTIME_OMIT  ((1l << 30) - 2l)


int
main(void)
{
  int status = 0;

  int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
  if (fd == -1)
error (1, errno, "failed to create test file \"ttt\"");

  struct stat64 st1;
  if (fstat64 (fd, ) != 0)
error (1, errno, "fstat failed");

  struct timespec t[2];
  t[0].tv_sec = 0;
  t[0].tv_nsec = 0;
  t[1].tv_sec = 0;
  t[1].tv_nsec = 0;
  if (syscall(280, AT_FDCWD, "ttt", t) != 0)
error (1, errno, "utimensat failed");

  struct stat64 st2;
  if (fstat64 (fd, ) != 0)
error (1, errno, "fstat failed");
  
  if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
{
  puts ("atim not reset to zero");
  status = 1;
}
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
  puts ("mtim not reset to zero");
  status = 1;
}
  if (status != 0)
goto out;

  t[0] = st1.st_atim;
  t[1].tv_sec = 0;
  t[1].tv_nsec = UTIME_OMIT;
  if (syscall(280, AT_FDCWD, "ttt", t) != 0)
error (1, errno, "utimensat failed");

  if (fstat64 (fd, ) != 0)
error (1, errno, "fstat failed");
  
  if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
  || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
  puts ("atim not set");
  status = 1;
}
  if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
{
  puts ("mtim changed from zero");
  status = 1;
}
  if (status != 0)
goto out;

  t[0].tv_sec = 0;
  t[0].tv_nsec = UTIME_OMIT;
  t[1] = st1.st_mtim;
  if (syscall(280, AT_FDCWD, "ttt", t) != 0)
error (1, errno, "utimensat failed");

  if (fstat64 (fd, ) != 0)
error (1, errno, "fstat failed");
  
  if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
  || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
{
  puts ("mtim changed from original time");
  status = 1;
}
  if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
  || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
{
  puts ("mtim not set");
  status = 1;
}
  if (status != 0)
goto out;

  sleep (2);

  t[0].tv_sec = 0;
  t[0].tv_nsec = UTIME_NOW;
  t[1].tv_sec = 0;
  t[1].tv_nsec = UTIME_NOW;
  if (syscall(280, AT_FDCWD, "ttt", t) != 0)
error (1, errno, "utimensat failed");

  if (fstat64 (fd, ) != 0)
error (1, errno, "fstat failed");

  struct timeval tv;
  gettimeofday(,NULL);

  if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
  || st2.st_atim.tv_sec > tv.tv_sec)
{
  puts ("atim not set to NOW");
  status = 1;
}
  if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
  || st2.st_mtim.tv_sec > tv.tv_sec)
{
  puts ("mtim not set to NOW");
  status = 1;
}

  if (status == 0)
 puts ("all OK");

 out:
  close (fd);
  unlink ("ttt");

  return status;
}


signature.asc
Description: OpenPGP digital signature

Re: linux-2.6.21-rc7-mm2 fails to compile

2007-04-26 Thread Andrew Morton

On Fri, 27 Apr 2007 00:00:15 + (GMT) William Heimbigner <[EMAIL PROTECTED]> 
wrote:

> Output leading up to the error:
> 
>CC  drivers/macintosh/macio-adb.o
>LD  drivers/macintosh/built-in.o
>CC [M]  drivers/macintosh/apm_emu.o
>CC [M]  drivers/macintosh/therm_windtunnel.o
> drivers/macintosh/therm_windtunnel.c: In function 'therm_of_remove':
> drivers/macintosh/therm_windtunnel.c:462: error: void value not ignored as it 
> ought to be
> drivers/macintosh/therm_windtunnel.c:463: warning: control reaches end of 
> non-void function
> make[2]: *** [drivers/macintosh/therm_windtunnel.o] Error 1
> make[1]: *** [drivers/macintosh] Error 2
> make: *** [drivers] Error 2
> 
> This is on an iMac G3 powerpc.

i2c_del_driver() was changed to reutrn void in the i2c tree.

--- 
a/drivers/macintosh/therm_windtunnel.c~fix-jdelvare-i2c-i2c-del-driver-returns-void-on-powerpc
+++ a/drivers/macintosh/therm_windtunnel.c
@@ -459,7 +459,8 @@ therm_of_probe( struct of_device *dev, c
 static int
 therm_of_remove( struct of_device *dev )
 {
-   return i2c_del_driver( _driver );
+   i2c_del_driver( _driver );
+   return 0;
 }
 
 static struct of_device_id therm_of_match[] = {{
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] change global zonelist order on NUMA v2

2007-04-26 Thread KAMEZAWA Hiroyuki

On Thu, 26 Apr 2007 17:57:40 -0400
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> On Thu, 2007-04-26 at 18:34 +0900, KAMEZAWA Hiroyuki wrote:
> > Changelog from V1 -> V2
> > - sysctl name is changed to be relaxed_zone_order
> > - NORMAL->NORMAL->->DMA->DMA->DMA order (new ordering) is now default.
> >   NORMAL->DMA->NORMAL->DMA order (old ordering) is optional.
> > - addes boot opttion to set relaxed_zone_order. ia64 is supported now.
> > - Added documentation
> > 
> > patch is against 2.6.21-rc7-mm2. tested on ia64 NUMA box. works well.
> 
> [PATCH] factor/rework change zonelist order patch
> 
> Against 2.6.21-rc7 atop KAMEZAWA Hiroyuki's "change global zonelist
> order on NUMA v2" patch.
> 
Hi, this looks 'easier-to-read' than mine. thanks.


> 3) kept early_param() definition for boot parameter in mm/page_alloc.c,
>along with the handler function.  One less file to modify.
> 
I put early_param() to arch dependent part just beacause no generic code
except for pci seems to call it. If it is allowed, I welcome this change.


> 4) modified the two Documentation additions to match these changes.
> 

> I've tested various combinations [non-exhaustive], with an ad hoc
> instrumentation patch, and it appears to work as expected [as I expect,
> anyway] on ia64 NUMA.
> 
> Question:  do we need to rebuild the zonelist caches when we reorder
>the zones?  The z_to_n[] array appears to be dependent on
>the zonelist order... 
> 
maybe no.


> Also:  I see the "Movable" zones show up in 21-rc7-mm2.  This patch
>will cause Movable zone to overflow to remote movable zones
>before using local Normal memory in non-default, zone order.
>Is this what we want?
> 
>From my point of view, it's what I want. What we have to do will be
establish a way to create ZONE_MOVABLE with suitable size on each node.

I'll merge your change to my set and add "automatic detection" support.

Thank you.
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v6

2007-04-26 Thread Gene Heskett

On Thursday 26 April 2007, Con Kolivas wrote:
>On Friday 27 April 2007 00:41, Gene Heskett wrote:
>> On Thursday 26 April 2007, Redeeman wrote:
>> >On Wed, 2007-04-25 at 23:47 +0200, Ingo Molnar wrote:
>> >
>> >
>> >> As usual, any sort of feedback, bugreport, fix and suggestion is more
>> >> than welcome,
>> >
>> >well, from my experiences with cfs on workstation/desktop, on amd64
>> >2ghz, cfs doesent measure up to SD at all.
>> >
>> >audio skips easily, with or without reniced X, and games are not as
>> >smooth. also it seems to be much lower throughput. vanilla seems much
>> >better by comparisin.
>
>That's a pretty bad regression for CFS to not even play audio without
>stuttering.
>
>> Well, I don't generally feel as if the few stutters I just heard in a cbs
>> news story I just played were anything but dsl glitches here. xmms,
>> playing high quality (Q7) oggs from my own drive are dead smoothly done. 
>> nbc did want to let me play anything because I have adblock enabled, and
>> neither did abc but their anims played nicely, fox news smoothly played
>> everything I clicked on.
>>
>> The only problem I've had so far is with a proprietary daemon from belkin,
>> which took about 10 kills and restarts after I'd booted to 2.6.21-CFS-v6
>> before it quit wanting a steady 40% of the cpu when it was running.  Now
>> it only does that when the bulldog gui is running.  That's not normal
>> either, but getting fixes out of belkin is like breeding elephants, lots
>> of yelling and screaming and it takes 22 months to get results.  My next
>> ups will NOT be a belkin.
>>
>> X is not reniced here, but I did run kmail up to -5.
>>
>> Compared to mainline?  I still think this is a 100% keeper for desktop
>> users like me.
>
>Hi Gene
>
>Have you given up on SD testing then since it's stable and CFS is still
> shiny and new (and written by Ingo) therefore it must be better?

Not necessarily Con.  Do you have a fresh one for 2.6.21?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Been Transferred Lately?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc7-mm2 -- x86_64 VDSO compile error II

2007-04-26 Thread Andi Kleen



> That's vdso.lds.S, not vdso.lds. The VDSO_TEXT_OFFSET is still 0x400.
> This is what I should have posted the last time
> 
> --- build/arch/x86_64/vdso/voffset.h.orig 2007-04-26 08:43:31.523739878 
> -0700
> +++ build/arch/x86_64/vdso/voffset.h  2007-04-26 08:43:38.839579356 -0700
> @@ -1 +1 @@
> -#define VDSO_TEXT_OFFSET 0x500
> +#define VDSO_TEXT_OFFSET 0x400

It's definitely 0x400 here. Is your patch reversed and you want 0x500? 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code

2007-04-26 Thread James Bottomley

On Fri, 2007-04-27 at 02:13 +0200, Adrian Bunk wrote:
> On Thu, Apr 26, 2007 at 07:59:57PM -0400, Robert P. J. Day wrote:
> > On Fri, 27 Apr 2007, Adrian Bunk wrote:
> > 
> > > This patch removes kernel 2.4 code.
> > >
> > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> > >
> > > ---
> > >
> > > This patch has been sent on:
> > > - 26 Mar 2007
> > >
> > >  drivers/scsi/nsp32.c |  109 +--
> > >  1 file changed, 13 insertions(+), 96 deletions(-)
> > >
> > > --- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old 2007-03-25 
> > > 20:27:34.0 +0200
> > > +++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c 2007-03-25 
> > > 20:31:59.0 +0200
> > > @@ -49,10 +49,6 @@
> > >  #include 
> > >  #include 
> > >
> > > -#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0))
> > > -# include 
> > > -#endif
> > > -
> > 
> > i'm curious about the rules for removing code like this.  in the case
> > of drivers, isn't it possible that some driver source could be
> > relevant for both the 2.4 and 2.6 kernel source tree, and simply uses
> > that kind of preprocessor check to make sure it's being compiled
> > appropriately?
> 
> That's what it was for.
> 
> > or are you doing something more sophisticated than simply checking the
> > kernel version being tested?
> 
> No.
> 
> The point is:
> 
> It seems this driver was once maintained for both 2.4 and 2.6 in one 
> file.
> 
> As long as this is done, such version checks are OK.

Personally, I don't like to see 2.4 and 2.6 in a new driver, and will
tend to try to force it to be 2.6  only.  For an existing driver, I tend
to be much more tolerant: removing the huge gobs of code to achieve 2.6
only is usually a bit disruptive on both the driver and the maintainer

> But if a driver is no longer actually maintained for both kernels these 
> checks become useless (and there quickly arised unconditional 2.6-only 
> code in such a driver) and can be removed.

This driver is maintained by 

Yokota Hiroshi <[EMAIL PROTECTED]>
GOTO Masanori <[EMAIL PROTECTED]>

As it says in the header.  It was last modified in May 2006, so it is
maintained under the somewhat elastic standards of SCSI.  I've cc'd them
to see what they think.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread William Lee Irwin III

William Lee Irwin III <[EMAIL PROTECTED]> writes:
>> In memory as on disk, contiguity matters a lot for performance.

On Thu, Apr 26, 2007 at 12:21:24PM -0600, Eric W. Biederman wrote:
> Not nearly so much though.  In memory you don't have seeks to avoid.
> On disks avoiding seeks is everything.

I readily concede that seeks are most costly. Yet memory contiguity
remains rather influential.

Witness the fact that I'm now being called upon a second time to
adjust the order in which mm/page_alloc.c returns pages for the
sake of implicitly establishing IO contiguity (or otherwise
determining why things are coming out backward now).

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-2.6.21 and __udivid3

2007-04-26 Thread l l


Hi,

Sorry for the late, i was in my bed.


I assume this is a plain 2.6.21 from ftp.kernel.org?


Yes.


Can you reproduce this with gcc 4.1?
If yes, please send your .config .


I don't, i think i have to go back to gcc-4.2.0 which was fine with
linux-2.6.21-rc7.
It will be same to 2.6.21.

TIA
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] change global zonelist order on NUMA v2

2007-04-26 Thread KAMEZAWA Hiroyuki

On Thu, 26 Apr 2007 08:48:19 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Thu, 26 Apr 2007, KAMEZAWA Hiroyuki wrote:
> 
> > (1)Use new zonelist ordering always and move init_task's tied cpu to a
> >   cpu on the best node. 
> >   Child processes will start in good nodes even if Node 0 has small memory.
> 
> How about renumbering the nodes? Node 0 is the one with no DMA memory and 
> node 1 may be the one with the DMA? That would take care of things even 
> without core modifications. We can start on node 0 (which hardware 1) and 
> consume the required memory for boot there not impacting the node with the 
> DMA memory.
> 
It seems a bit complicated. If we do so, following can occur,

Node1: cpu0,1,2,3
Node0: cpu4,5,6,7

the system layout will be not imaginable look, maybe.

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/1] IBM PPC EMAC driver:improved support for PHYconfiguration

2007-04-26 Thread Jeff Haran

> -Original Message-
> From: Benjamin Herrenschmidt [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, April 26, 2007 5:19 PM
> To: Jeff Haran
> Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/1] IBM PPC EMAC driver:improved support 
> for PHYconfiguration
> 
> On Thu, 2007-04-26 at 16:18 -0700, Jeff Haran wrote:
> > From: Jeff Haran  <[EMAIL PROTECTED]>
> > 
> > This patch fixes some problems I found while debugging the IBM EMAC 

...

> Your patch appears to have been line wrapped by your mailer though...

Benjamin,

Friggin Outlook and our M$ centered corporate IT environment. Even when
you tell it to send plain text it modifies it. This was my first attempt
at patch submission, so I don't know all the hoops yet.

I can try sending the patch again if I can figure out how to disable the
line wrap. Or if you have some FTP server I can put to I can send you
the patch file directly there.

Please let me know which you'd prefer.

Thanks,

Jeff

> 
> Cheers,
> Ben.
> 
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21

2007-04-26 Thread Linus Torvalds

On Fri, 27 Apr 2007, Thomas Gleixner wrote:
> 
> Maybe we need to coordinate changes better. 2.6.21 got three big updates
> which affected suspend/resume - one of them is my fault. But fiddling
> out which one of those - we had nested problems as well - makes it quite
> hard to grok them in time, especially if they happen only on one
> reporters system.

Yes. _If_ we had known how painful the timer changes would end up being, 
we'd probably have done them separately from everything else.

That is the kind of thing that looks obvious in hindsight: merge stuff 
that is questionable and scary alone, and don't do anything else that 
release cycle.

But while the timer code is obviously pretty core, I think everybody 
expected it to be a lot easier to merge (and it had existed as patches in 
various forms for some time).

So we simply didn't know beforehand that it was going to cause the kinds 
of regressions it did cause (and in fact, some of the regressions were 
initially blamed on other things entirely - some of them looked like IO 
regressions).

Water under the bridge. It's also easy to say in hindsight that something 
should have been merged separately and been given a release cycle all its 
own.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Back to the future.

2007-04-26 Thread Olivier Galibert

On Thu, Apr 26, 2007 at 03:49:51PM -0700, David Lang wrote:
> swap partitions are limited to 2G (or at least they were a couple of months 
> ago when I last checked). I also don't want to run the risk of having a box 
> try to _use_ 16G worth of swap. I'd rather have the box hit OOM first.

They aren't limited anymore, I have a number of machines with 20G swap
for experiments.

  OG.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread William Lee Irwin III

On (26/04/07 20:39), Jens Axboe didst pronounce:
>> But don't push large pages as the only solution to larger ios, because
>> that is trivially not true.

On Thu, Apr 26, 2007 at 09:22:02PM +0100, Mel Gorman wrote:
> Would it be fair to say that your approach and using large pages are not
> mutually exclusive solutions? It seems a lot of the debate here is
> assuming there is One And Only One Solution for larger ios.

I'd like to see how all the strategies mentioned thus far do in
conjunction.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] IBM PPC EMAC driver:improved support for PHY configuration

2007-04-26 Thread Benjamin Herrenschmidt

On Thu, 2007-04-26 at 16:18 -0700, Jeff Haran wrote:
> From: Jeff Haran  <[EMAIL PROTECTED]>
> 
> This patch fixes some problems I found while debugging the IBM EMAC
> driver for PPC32 systems.
> The first problem was in the function that configures the PHY for
> autonegotiation, genmii_setup_aneg(). The original code does a
> read/modify/write of the autonegotiation advertizement register (reg 4),
> followed by a read/modify/write of the control register (reg 0). While
> the original code follows the proper procedure as per reading the IEEE
> specs, what I found is that on at least one PHY model (National DP83843)
> the read of the control register comes back with the soft reset bit set
> (bit 15).

Good catch ! I've seen that behaviour in the past too. Note that sungem
has this problem too.

 .../...

> The second problem was in the function that configures the PHY for
> forced operation, genmii_setup_forced(). The original code initiates a
> software reset operation via a write of a 1 to bit 15 of the control
> register (reg 0), but then proceeds to do a second write to that same
> register without waiting until that reset bit is cleared by the PHY
> itself (which according to the IEEE specs indicates that the PHY reset
> is complete). This is a violation of how one is supposed to use this
> software reset feature of these PHYs and I believe was the cause of
> mysterious, difficult to reproduce link failures that we've observed on
> some of our systems that use this driver. The fix is to modify the
> function so that it spins waiting for the reset bit to clear after doing
> the soft reset and before doing the subsequent write. Since this
> modification, we haven't seen the mysterious link failures, though they
> were so rare its difficult to say at this point whether this was the
> cause.

This is also a bug inherited from sungem (thus my fault).

> I also added some error handling and reporting for the abnormal case
> where the reset bit never clears from the soft reset operation.
> Applied to kernel version 2.6.21.

Your patch appears to have been line wrapped by your mailer though...

Cheers,
Ben.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [00/17] Large Blocksize Support V3

2007-04-26 Thread Jeremy Higdon

On Thu, Apr 26, 2007 at 11:50:33PM +1000, David Chinner wrote:
> On Thu, Apr 26, 2007 at 04:10:32AM -0600, Eric W. Biederman wrote:
> > > And then there's the problem that most hardware is limited to 128
> > > s/g entries and that means 128 non-contiguous pages in memory is the
> > > maximum I/O size we can issue to these devices. We have RAID arrays
> > > that go twice as fast if we can send them 1MB I/Os instead of 512k
> > > I/Os and that means we need contiguous pages to be handled to the
> > > devices
> > 
> > Ok.  Now why are high end hardware manufacturers building crippled
> > hardware?  Or is there only an 8bit field in SCSI for describing
> > scatter gather entries?  Although I would think this would be
> > move of a controller ranter than a drive issue.
> 
> scsi.h:
> 
> /*
>  *  The maximum sg list length SCSI can cope with
>  *  (currently must be a power of 2 between 32 and 256)
>  */
> #define SCSI_MAX_PHYS_SEGMENTS  MAX_PHYS_SEGMENTS
> 
> And from blkdev.h:
> 
> #define MAX_PHYS_SEGMENTS 128
> #define MAX_HW_SEGMENTS 128
> 
> So currentlt on SCSI we are limited to 128 s/g entries, and the
> maximum is 256.  So I'd say we've got good grounds for needing
> contiguous pages to go beyond 1MB I/O size on x86_64.

Right, and there are also RAID devices that really want a 2 MiB I/O
size.  Even if we could use 512 s/g entries (which would take two
pages), the other big problem is that many I/O chips/cards are limited
in the amount of space they have for s/g lists.  So, you'd face the
possibility that you could do a 2MiB I/O request with 512 s/g entries,
but then you couldn't start a second request on that host until the
first one finished.

jeremy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [-mm patch] unexport highlevel_host_reset

2007-04-26 Thread Stefan Richter

Adrian Bunk wrote:
> highlevel_host_reset no longer has any modular users.

Thanks, I missed this when I removed the last usage outside the 1394
core.  Committed to linux1394-2.6.git.
-- 
Stefan Richter
-=-=-=== -=-- ==-==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] battery2 git repository

2007-04-26 Thread Greg KH

On Fri, Apr 27, 2007 at 03:29:02AM +0400, Anton Vorontsov wrote:
> 
> To not confuse with David own battery-2.6 repository, it's called
> battery2-2.6, and can be found here:

So, I'm still not quite understanding this, is David giving up on his
code now that you have pulled his changes into your tree?  Or is there
some reason there are still two competing versions here?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: mm snapshot broken-out-2007-04-25-02-49.tar.gz uploaded

2007-04-26 Thread Greg KH

On Thu, Apr 26, 2007 at 11:21:35PM +0200, Michal Piotrowski wrote:
> Hi Greg,
> 
> Michal Piotrowski napisa?(a):
> > Hi all,
> > 
> > [EMAIL PROTECTED] napisa?(a):
> >> The mm snapshot broken-out-2007-04-25-02-49.tar.gz has been uploaded to
> >>
> >>
> >> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-04-25-02-49.tar.gz
> >>
> >> It contains the following patches against 2.6.21-rc7:
> >>
> > 
> > My network doesn't work
> > 
> > $ ping 192.168.0.1
> > connect: Network is unreachable
> > 
> > NIC: 
> > 02:05.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T 
> > [Marvell] (rev 12)
> > 
> > quilt patches drivers/net/skge.c
> > patches/git-netdev-all.patch
> > patches/git-net.patch
> > 
> > Bisection...
> > 
> 
> I tracked this problem down to your patches (kernel 2.6.21-rc7-mm2)
> 
> gregkh-driver-sysfs-fix-i_ino-handling-in-sysfs.patch
> gregkh-driver-sysfs-fix-error-handling-in-binattr-write.patch
> gregkh-driver-sysfs-move-release_sysfs_dirent-to-dirc.patch
> gregkh-driver-sysfs-flatten-cleanup-paths-in-sysfs_add_link-and-create_dir.patch
> gregkh-driver-sysfs-consolidate-sysfs_dirent-creation-functions.patch
> gregkh-driver-sysfs-add-sysfs_dirent-s_parent.patch
> gregkh-driver-sysfs-add-sysfs_dirent-s_name.patch
> gregkh-driver-sysfs-make-sysfs_dirent-s_element-a-union.patch
> gregkh-driver-sysfs-implement-kobj_sysfs_assoc_lock.patch
> gregkh-driver-sysfs-reimplement-symlink-using-sysfs_dirent-tree.patch
> gregkh-driver-sysfs-implement-bin_buffer.patch
> gregkh-driver-sysfs-implement-sysfs_dirent-active-reference-and-immediate-disconnect.patch
> gregkh-driver-sysfs-kill-attribute-file-orphaning.patch
> gregkh-driver-sysfs-kill-unnecessary-attribute-owner.patch
> gregkh-driver-sysfs-make-lockdep-ignore-s_active.patch
> gregkh-driver-sysfs-make-sysfs_put-ignore-null-sd.patch
> gregkh-driver-sysfs-rename-object_depth-to-sysfs_path_depth-and-make-it-global.patch
> gregkh-driver-sysfs-reimplement-sysfs_drop_dentry.patch
> gregkh-driver-sysfs-kill-sysfs_dirent-s_dentry.patch
> gregkh-driver-driver-core-make-uevent-environment-available-in-uevent-file.patch
> 
> ABI breakage?

I've dropped the gregkh-driver-sysfs-* patches listed above from my tree
as Tejun is reworking the series to handle all of the different changes
that have been needed as they have been tested, so you shouldn't have to
worry about them.

But I would be interested in finding out why the above is breaking for
you.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code

2007-04-26 Thread Adrian Bunk

On Thu, Apr 26, 2007 at 07:59:57PM -0400, Robert P. J. Day wrote:
> On Fri, 27 Apr 2007, Adrian Bunk wrote:
> 
> > This patch removes kernel 2.4 code.
> >
> > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> >
> > ---
> >
> > This patch has been sent on:
> > - 26 Mar 2007
> >
> >  drivers/scsi/nsp32.c |  109 +--
> >  1 file changed, 13 insertions(+), 96 deletions(-)
> >
> > --- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old   2007-03-25 
> > 20:27:34.0 +0200
> > +++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c   2007-03-25 
> > 20:31:59.0 +0200
> > @@ -49,10 +49,6 @@
> >  #include 
> >  #include 
> >
> > -#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0))
> > -# include 
> > -#endif
> > -
> 
> i'm curious about the rules for removing code like this.  in the case
> of drivers, isn't it possible that some driver source could be
> relevant for both the 2.4 and 2.6 kernel source tree, and simply uses
> that kind of preprocessor check to make sure it's being compiled
> appropriately?

That's what it was for.

> or are you doing something more sophisticated than simply checking the
> kernel version being tested?

No.

The point is:

It seems this driver was once maintained for both 2.4 and 2.6 in one 
file.

As long as this is done, such version checks are OK.

But if a driver is no longer actually maintained for both kernels these 
checks become useless (and there quickly arised unconditional 2.6-only 
code in such a driver) and can be removed.

> rday

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] utimensat implementation

2007-04-26 Thread H. Peter Anvin

Andrew Morton wrote:
> On Thu, 26 Apr 2007 18:49:05 -0400 Ulrich Drepper <[EMAIL PROTECTED]> wrote:
>>
>> If the tv_nsec value of either of the elements of the utimes parameter to 
>> utimensat() is UTIME_OMIT no update of that respective value is performed.
> 
> ITYM "If the value of either of the elements..."
> 
> +#define UTIME_NOW((1l << 30) - 1l)
> +#define UTIME_OMIT   ((1l << 30) - 2l)
> 
> OK, so there's no collision on ts_nsec if unnormalised timespecs are
> disallowed.
> 
> But there's a potential collision on ts_sec?  Do we know what date that
> corresponds to?

"If the tv_nsec value" implies that these magic numbers have no impact
on these.

I'm a bit leery of abusing the timespec value like this, though.  A
flags field seem like it would be cleaner.

Something else... if we're dickering with these interfaces, shouldn't we
allow setting atime as well?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Back to the future.

2007-04-26 Thread Olivier Galibert

On Fri, Apr 27, 2007 at 06:50:56AM +1000, Nigel Cunningham wrote:
> I'm perfectly willing to think through some alternate approach if you
> suggest something or prod my thinking in a new direction, but I'm afraid
> I just can't see right now how we can achieve what you're after.

Ok, what about this approach I've been mulling about for a while:

Suspend-to-disk is pretty much an exercise in state saving.  There are
multiple ways to do state saving, but they tend to end up in two
categories: implicit and explicit.

In implicit state saving, you try to save the state of the
system/application/whatever "under its feet", more or less, and then
fixup what is no saved/saveable correctly.  A well-known example is
the undumping process Emacs goes (went?) where it tries to dump the
state of the memory as a new executable, with a lot of pleasure with
various executable formats and subtleties due to side effects in libc
code you don't control.

In explicit state saving each object saves what is needed from its
state to an independently defined format (instead of "whatever the
memory organization happens to be at that point").  When reloading the
state you have to parse it, and it usually requires
rebuilding/relocating all references/pointers/etc.  XEmacs currently
has a "portable dumper" that pretty much does just that.  We don't
have any redumping problems anymore, they're over.

Which one is the best depends heavily on the application.  The amount
of code in the implicit case depends on the amount of fixups to do.
In the kernel case it happens to be a lot, pretty much everything that
touches hardware has to save to memory the device state and reload it
on resume.  And bugs on hardware handling can be quite annoying to
debug.  And if some driver does not to saving/resume correctly, you
have no way outside of playing with modules to ensure the safety of
the suspend cycle.

The amount of code in the explicit case is an interesting variable in
the case of the kernel.  You have to save what is needed, but how do
you define what is needed?  It is, pretty much, what running processes
can observe from userspace.  Now, what can a process observe:
- its application text and anonymous memory pages
- its file handles
- its mapped files
- its mapped whatever else
- its sys5 IPC stuff
- futex stuff and friends, namespaces, etc
- its intrinsic characteristics it can reach through syscalls
  (i.e. the user-visible parts of current, like pid, uid...)
- its currently running system call, if any

So that's what we'd have to explicitely save.  Anonymous memory, sys5
IPC, futex and current structures, that's easy stuff in practice.  The
fun part are pretty much:
- references to files
- references to active networking links
- references to devices and associated visible state
- currently running system call, aka the kernel stack for the process

The last one is the one I'm the most afraid of.  I hope that the
signal stuff and/or the asynchronous syscall stuff that was discussed
recently would allow to "unwind" blocking system calls back to the
syscall level and then store the parameters for resume-time restart.
The non-blocking calls you can just let finish.

The first one is really interesting.  If you value your filesystems,
you'd rather have them clean after the suspend.  And also you pretty
much know that filesystems can move around when you're not looking, be
it USB hotplug stuff (discovery order is random-ish isn't it?), module
loading order issues or multithreaded device discovery.  So you're way
more happy *not* caching anything from the filesystem you can avoid.

But what is a file reference, really?  With the dcache handy, it's
pretty much a path, since inodes don't always exist reliably.  And if
you have the lists of paths used by the processes on a particular
filesystem, you can easily get an idea of where, if anywhere, the
filesystem is even if you don't have reliable serials.  More
interestingly, you cannot, in any case, instantly corrupt your
filesystem by having a mismatch between the in-memory cache and the
reality.

The processes which referenced files you can't find anywhere will
end-up with EBADF or segfault depending on whether it was fd or mmap,
ala revoke().  They'll probably die horribly.  I'd rather have
processes die than filesystems die, since in any case if the file
isn't here anymore in practice the process could only destroy things.

An interesting things there, nothing in that touches either the
filesystem or the block devices.  Everything is done at the VFS level.
The devices don't need to care.  And the "this filesystem goes there"
can be done in userspace in an initramfs if people want to experiment
with kinky strategies.  After all, why not allow a sysadmin to regroup
two filesystems into one though a suspend, the processes mostly don't
need to care (well, tar may, but heh).  Deleted files would have to be
sillyrenamed or something.  Implementation details ;-)

Active networking links, you can

Re: [2.6 patch] drivers/scsi/nsp32.c: remove kernel 2.4 code

2007-04-26 Thread Robert P. J. Day

On Fri, 27 Apr 2007, Adrian Bunk wrote:

> This patch removes kernel 2.4 code.
>
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
>
> ---
>
> This patch has been sent on:
> - 26 Mar 2007
>
>  drivers/scsi/nsp32.c |  109 +--
>  1 file changed, 13 insertions(+), 96 deletions(-)
>
> --- linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c.old 2007-03-25 
> 20:27:34.0 +0200
> +++ linux-2.6.21-rc4-mm1/drivers/scsi/nsp32.c 2007-03-25 20:31:59.0 
> +0200
> @@ -49,10 +49,6 @@
>  #include 
>  #include 
>
> -#if (LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0))
> -# include 
> -#endif
> -

i'm curious about the rules for removing code like this.  in the case
of drivers, isn't it possible that some driver source could be
relevant for both the 2.4 and 2.6 kernel source tree, and simply uses
that kind of preprocessor check to make sure it's being compiled
appropriately?

or are you doing something more sophisticated than simply checking the
kernel version being tested?

rday

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1476 matches

Mail list logo