Re: system without RAM on node0 boot fail

2008-02-01 Thread dean gaudet
actually yeah i've seen this... in a bizarre failure situation in a system 
which physically had RAM in the boot node but it was never enumerated for 
the kernel (other nodes had RAM which was enumerated).

so technically there was boot node RAM but the kernel never saw it.

-dean

On Wed, 30 Jan 2008, Christoph Lameter wrote:

> x86 supports booting from a node without RAM?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

2008-02-01 Thread dean gaudet
why do we need another kernel cpuid reading method when sched_setaffinity 
exists and cpuid is available in ring3?

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: add PCI IDs to k8topology_64.c II

2008-02-01 Thread dean gaudet
On Tue, 29 Jan 2008, Andi Kleen wrote:

> > SRAT is essentially just a two dimensional table with node distances.
> 
> Sorry, that was actually SLIT. SRAT is not two dimensional, but also
> relatively simple. SLIT you don't really need to implement.

yeah but i'd heartily recommend implementing SLIT too.  mind you it's 
almost universal non-existence means i've had to resort to userland 
measurements to determine node distances and that won't change.  i guess i 
just wanted to grumble somewhere.

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: add PCI IDs to k8topology_64.c II

2008-02-01 Thread dean gaudet
On Tue, 29 Jan 2008, Andi Kleen wrote:

  SRAT is essentially just a two dimensional table with node distances.
 
 Sorry, that was actually SLIT. SRAT is not two dimensional, but also
 relatively simple. SLIT you don't really need to implement.

yeah but i'd heartily recommend implementing SLIT too.  mind you it's 
almost universal non-existence means i've had to resort to userland 
measurements to determine node distances and that won't change.  i guess i 
just wanted to grumble somewhere.

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

2008-02-01 Thread dean gaudet
why do we need another kernel cpuid reading method when sched_setaffinity 
exists and cpuid is available in ring3?

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: system without RAM on node0 boot fail

2008-02-01 Thread dean gaudet
actually yeah i've seen this... in a bizarre failure situation in a system 
which physically had RAM in the boot node but it was never enumerated for 
the kernel (other nodes had RAM which was enumerated).

so technically there was boot node RAM but the kernel never saw it.

-dean

On Wed, 30 Jan 2008, Christoph Lameter wrote:

 x86 supports booting from a node without RAM?
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fast network file copy; "recvfile()" ?

2008-01-21 Thread dean gaudet
On Thu, 17 Jan 2008, Patrick J. LoPresti wrote:

> I need to copy large (> 100GB) files between machines on a fast
> network.  Both machines have reasonably fast disk subsystems, with
> read/write performance benchmarked at > 800 MB/sec. Using 10GigE cards
> and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP
> throughput better than 600 MB/sec.
> 
> My question is how best to move the actual file.  NFS writes appear to
> max out at a little over 100 MB/sec on this configuration.

did your "usual tweaks" include mounting with -o tcp,rsize=262144,wsize=262144 ?

i should have kept better notes last time i was experimenting with this,
but from memory here's what i found:

- if i used three NFS clients and was reading from page cache on the
  server i hit 1.2GB/s total throughput from the server.  the client
  NFS code was maxing out one CPU on each of the client machines.

- disk subsystem (sw raid10 far2) was capable of 600MB/s+ when read
  locally on the NFS server, but topped out around ~250MB/s when read
  remotely (no matter how many clients).

my workload was read-intensive so i didn't experiment with writes...

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fast network file copy; recvfile() ?

2008-01-21 Thread dean gaudet
On Thu, 17 Jan 2008, Patrick J. LoPresti wrote:

 I need to copy large ( 100GB) files between machines on a fast
 network.  Both machines have reasonably fast disk subsystems, with
 read/write performance benchmarked at  800 MB/sec. Using 10GigE cards
 and the usual tweaks to tcp_rmem etc., I am getting single-stream TCP
 throughput better than 600 MB/sec.
 
 My question is how best to move the actual file.  NFS writes appear to
 max out at a little over 100 MB/sec on this configuration.

did your usual tweaks include mounting with -o tcp,rsize=262144,wsize=262144 ?

i should have kept better notes last time i was experimenting with this,
but from memory here's what i found:

- if i used three NFS clients and was reading from page cache on the
  server i hit 1.2GB/s total throughput from the server.  the client
  NFS code was maxing out one CPU on each of the client machines.

- disk subsystem (sw raid10 far2) was capable of 600MB/s+ when read
  locally on the NFS server, but topped out around ~250MB/s when read
  remotely (no matter how many clients).

my workload was read-intensive so i didn't experiment with writes...

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet
On Tue, 15 Jan 2008, Andrew Morton wrote:

> On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet <[EMAIL PROTECTED]> 
> wrote:
> 
> > On Mon, 14 Jan 2008, NeilBrown wrote:
> > 
> > > 
> > > raid5's 'make_request' function calls generic_make_request on
> > > underlying devices and if we run out of stripe heads, it could end up
> > > waiting for one of those requests to complete.
> > > This is bad as recursive calls to generic_make_request go on a queue
> > > and are not even attempted until make_request completes.
> > > 
> > > So: don't make any generic_make_request calls in raid5 make_request
> > > until all waiting has been done.  We do this by simply setting
> > > STRIPE_HANDLE instead of calling handle_stripe().
> > > 
> > > If we need more stripe_heads, raid5d will get called to process the
> > > pending stripe_heads which will call generic_make_request from a
> > > different thread where no deadlock will happen.
> > > 
> > > 
> > > This change by itself causes a performance hit.  So add a change so
> > > that raid5_activate_delayed is only called at unplug time, never in
> > > raid5.  This seems to bring back the performance numbers.  Calling it
> > > in raid5d was sometimes too soon...
> > > 
> > > Cc: "Dan Williams" <[EMAIL PROTECTED]>
> > > Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
> > 
> > probably doesn't matter, but for the record:
> > 
> > Tested-by: dean gaudet <[EMAIL PROTECTED]>
> > 
> > this time i tested with internal and external bitmaps and it survived 8h 
> > and 14h resp. under the parallel tar workload i used to reproduce the 
> > hang.
> > 
> > btw this should probably be a candidate for 2.6.22 and .23 stable.
> > 
> 
> hm, Neil said
> 
>   The first fixes a bug which could make it a candidate for 24-final. 
>   However it is a deadlock that seems to occur very rarely, and has been in
>   mainline since 2.6.22.  So letting it into one more release shouldn't be
>   a big problem.  While the fix is fairly simple, it could have some
>   unexpected consequences, so I'd rather go for the next cycle.
> 
> food fight!
> 

heheh.

it's really easy to reproduce the hang without the patch -- i could
hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
i'll try with ext3... Dan's experiences suggest it won't happen with ext3
(or is even more rare), which would explain why this has is overall a
rare problem.

but it doesn't result in dataloss or permanent system hangups as long
as you can become root and raise the size of the stripe cache...

so OK i agree with Neil, let's test more... food fight over! :)

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet
On Mon, 14 Jan 2008, NeilBrown wrote:

> 
> raid5's 'make_request' function calls generic_make_request on
> underlying devices and if we run out of stripe heads, it could end up
> waiting for one of those requests to complete.
> This is bad as recursive calls to generic_make_request go on a queue
> and are not even attempted until make_request completes.
> 
> So: don't make any generic_make_request calls in raid5 make_request
> until all waiting has been done.  We do this by simply setting
> STRIPE_HANDLE instead of calling handle_stripe().
> 
> If we need more stripe_heads, raid5d will get called to process the
> pending stripe_heads which will call generic_make_request from a
> different thread where no deadlock will happen.
> 
> 
> This change by itself causes a performance hit.  So add a change so
> that raid5_activate_delayed is only called at unplug time, never in
> raid5.  This seems to bring back the performance numbers.  Calling it
> in raid5d was sometimes too soon...
> 
> Cc: "Dan Williams" <[EMAIL PROTECTED]>
> Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

probably doesn't matter, but for the record:

Tested-by: dean gaudet <[EMAIL PROTECTED]>

this time i tested with internal and external bitmaps and it survived 8h 
and 14h resp. under the parallel tar workload i used to reproduce the 
hang.

btw this should probably be a candidate for 2.6.22 and .23 stable.

thanks
-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet
On Mon, 14 Jan 2008, NeilBrown wrote:

 
 raid5's 'make_request' function calls generic_make_request on
 underlying devices and if we run out of stripe heads, it could end up
 waiting for one of those requests to complete.
 This is bad as recursive calls to generic_make_request go on a queue
 and are not even attempted until make_request completes.
 
 So: don't make any generic_make_request calls in raid5 make_request
 until all waiting has been done.  We do this by simply setting
 STRIPE_HANDLE instead of calling handle_stripe().
 
 If we need more stripe_heads, raid5d will get called to process the
 pending stripe_heads which will call generic_make_request from a
 different thread where no deadlock will happen.
 
 
 This change by itself causes a performance hit.  So add a change so
 that raid5_activate_delayed is only called at unplug time, never in
 raid5.  This seems to bring back the performance numbers.  Calling it
 in raid5d was sometimes too soon...
 
 Cc: Dan Williams [EMAIL PROTECTED]
 Signed-off-by: Neil Brown [EMAIL PROTECTED]

probably doesn't matter, but for the record:

Tested-by: dean gaudet [EMAIL PROTECTED]

this time i tested with internal and external bitmaps and it survived 8h 
and 14h resp. under the parallel tar workload i used to reproduce the 
hang.

btw this should probably be a candidate for 2.6.22 and .23 stable.

thanks
-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet
On Tue, 15 Jan 2008, Andrew Morton wrote:

 On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet [EMAIL PROTECTED] 
 wrote:
 
  On Mon, 14 Jan 2008, NeilBrown wrote:
  
   
   raid5's 'make_request' function calls generic_make_request on
   underlying devices and if we run out of stripe heads, it could end up
   waiting for one of those requests to complete.
   This is bad as recursive calls to generic_make_request go on a queue
   and are not even attempted until make_request completes.
   
   So: don't make any generic_make_request calls in raid5 make_request
   until all waiting has been done.  We do this by simply setting
   STRIPE_HANDLE instead of calling handle_stripe().
   
   If we need more stripe_heads, raid5d will get called to process the
   pending stripe_heads which will call generic_make_request from a
   different thread where no deadlock will happen.
   
   
   This change by itself causes a performance hit.  So add a change so
   that raid5_activate_delayed is only called at unplug time, never in
   raid5.  This seems to bring back the performance numbers.  Calling it
   in raid5d was sometimes too soon...
   
   Cc: Dan Williams [EMAIL PROTECTED]
   Signed-off-by: Neil Brown [EMAIL PROTECTED]
  
  probably doesn't matter, but for the record:
  
  Tested-by: dean gaudet [EMAIL PROTECTED]
  
  this time i tested with internal and external bitmaps and it survived 8h 
  and 14h resp. under the parallel tar workload i used to reproduce the 
  hang.
  
  btw this should probably be a candidate for 2.6.22 and .23 stable.
  
 
 hm, Neil said
 
   The first fixes a bug which could make it a candidate for 24-final. 
   However it is a deadlock that seems to occur very rarely, and has been in
   mainline since 2.6.22.  So letting it into one more release shouldn't be
   a big problem.  While the fix is fairly simple, it could have some
   unexpected consequences, so I'd rather go for the next cycle.
 
 food fight!
 

heheh.

it's really easy to reproduce the hang without the patch -- i could
hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
i'll try with ext3... Dan's experiences suggest it won't happen with ext3
(or is even more rare), which would explain why this has is overall a
rare problem.

but it doesn't result in dataloss or permanent system hangups as long
as you can become root and raise the size of the stripe cache...

so OK i agree with Neil, let's test more... food fight over! :)

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


nosmp/maxcpus=0 or 1 -> TSC unstable

2008-01-12 Thread dean gaudet
if i boot an x86 64-bit 2.6.24-rc7 kernel with nosmp, maxcpus=0 or 1 it 
still disables TSC :)

Marking TSC unstable due to TSCs unsynchronized

this is an opteron 2xx box which does have two cpus and no clock-divide in 
halt or cpufreq enabled so TSC should be fine with only one cpu.

pretty sure this is the culprit is that num_possible_cpus() > 1, which 
would mean cpu_possible_map contains the second cpu... but i'm not quite 
sure what the right fix is... or perhaps this is all intended.

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


nosmp/maxcpus=0 or 1 - TSC unstable

2008-01-12 Thread dean gaudet
if i boot an x86 64-bit 2.6.24-rc7 kernel with nosmp, maxcpus=0 or 1 it 
still disables TSC :)

Marking TSC unstable due to TSCs unsynchronized

this is an opteron 2xx box which does have two cpus and no clock-divide in 
halt or cpufreq enabled so TSC should be fine with only one cpu.

pretty sure this is the culprit is that num_possible_cpus()  1, which 
would mean cpu_possible_map contains the second cpu... but i'm not quite 
sure what the right fix is... or perhaps this is all intended.

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CPA patchset

2008-01-11 Thread dean gaudet
On Fri, 11 Jan 2008, dean gaudet wrote:

> On Fri, 11 Jan 2008, Ingo Molnar wrote:
> 
> > * Andi Kleen <[EMAIL PROTECTED]> wrote:
> > 
> > > Cached requires the cache line to be read first before you can write 
> > > it.
> > 
> > nonsense, and you should know it. It is perfectly possible to construct 
> > fully written cachelines, without reading the cacheline first. MOVDQ is 
> > SSE1 so on basically in every CPU today - and it is 16 byte aligned and 
> > can generate full cacheline writes, _without_ filling in the cacheline 
> > first.
> 
> did you mean to write MOVNTPS above?

btw in case you were thinking a normal store to WB rather than a 
non-temporal store... i ran a microbenchmark streaming stores to every 16 
bytes of a 16MiB region aligned to 4096 bytes on a xeon 53xx series CPU 
(4MiB L2) + 5000X northbridge and the avg latency of MOVNTPS is 12 cycles 
whereas the avg latency of MOVAPS is 20 cycles.

the inner loop is unrolled 16 times so there are literally 4 cache lines 
worth of stores being stuffed into the store queue as fast as possible... 
and there's no coalescing for normal stores even on this modern CPU.

i'm certain i'll see the same thing on AMD... it's a very hard thing to do 
in hardware without the non-temporal hint.

-dean


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CPA patchset

2008-01-11 Thread dean gaudet
On Fri, 11 Jan 2008, Ingo Molnar wrote:

> * Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > Cached requires the cache line to be read first before you can write 
> > it.
> 
> nonsense, and you should know it. It is perfectly possible to construct 
> fully written cachelines, without reading the cacheline first. MOVDQ is 
> SSE1 so on basically in every CPU today - and it is 16 byte aligned and 
> can generate full cacheline writes, _without_ filling in the cacheline 
> first.

did you mean to write MOVNTPS above?


> Bulk ops (string ops, etc.) will do full cacheline writes too, 
> without filling in the cacheline.

on intel with fast strings enabled yes.  mind you intel gives hints in
the documentation these operations don't respect coherence... and i
asked about this when they posted their memory ordering paper but got no
response.

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CPA patchset

2008-01-11 Thread dean gaudet
On Fri, 11 Jan 2008, Ingo Molnar wrote:

 * Andi Kleen [EMAIL PROTECTED] wrote:
 
  Cached requires the cache line to be read first before you can write 
  it.
 
 nonsense, and you should know it. It is perfectly possible to construct 
 fully written cachelines, without reading the cacheline first. MOVDQ is 
 SSE1 so on basically in every CPU today - and it is 16 byte aligned and 
 can generate full cacheline writes, _without_ filling in the cacheline 
 first.

did you mean to write MOVNTPS above?


 Bulk ops (string ops, etc.) will do full cacheline writes too, 
 without filling in the cacheline.

on intel with fast strings enabled yes.  mind you intel gives hints in
the documentation these operations don't respect coherence... and i
asked about this when they posted their memory ordering paper but got no
response.

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CPA patchset

2008-01-11 Thread dean gaudet
On Fri, 11 Jan 2008, dean gaudet wrote:

 On Fri, 11 Jan 2008, Ingo Molnar wrote:
 
  * Andi Kleen [EMAIL PROTECTED] wrote:
  
   Cached requires the cache line to be read first before you can write 
   it.
  
  nonsense, and you should know it. It is perfectly possible to construct 
  fully written cachelines, without reading the cacheline first. MOVDQ is 
  SSE1 so on basically in every CPU today - and it is 16 byte aligned and 
  can generate full cacheline writes, _without_ filling in the cacheline 
  first.
 
 did you mean to write MOVNTPS above?

btw in case you were thinking a normal store to WB rather than a 
non-temporal store... i ran a microbenchmark streaming stores to every 16 
bytes of a 16MiB region aligned to 4096 bytes on a xeon 53xx series CPU 
(4MiB L2) + 5000X northbridge and the avg latency of MOVNTPS is 12 cycles 
whereas the avg latency of MOVAPS is 20 cycles.

the inner loop is unrolled 16 times so there are literally 4 cache lines 
worth of stores being stuffed into the store queue as fast as possible... 
and there's no coalescing for normal stores even on this modern CPU.

i'm certain i'll see the same thing on AMD... it's a very hard thing to do 
in hardware without the non-temporal hint.

-dean


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-29 Thread dean gaudet
On Sat, 29 Dec 2007, [EMAIL PROTECTED] wrote:

> On Sat, 29 Dec 2007 12:40:47 PST, dean gaudet said:
> 
> > the main worry i have is some user maliciously hardlinks everything
> > under /var/log somewhere else and slowly fills up the file system with
> > old rotated logs.
> 
> "Doctor, it hurts when I do this.." "Well, don't do that then".

actually it doesn't hurt.  i have other mechanisms which would pick this 
up fairly quickly.

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-29 Thread dean gaudet


On Sun, 30 Dec 2007, David Newall wrote:

> dean gaudet wrote:
> > > Pffuff.  That's what volume managers are for!  You do have (at least) two
> > > independent spindles in your RAID1 array, which give you less need to
> > > worry
> > > about head-stack contention.
> > > 
> > 
> > this system is write intensive and writes go to all spindles, so you're
> > assertion is wrong.
> 
> I don't know what you think I was asserting, but you were wrong.  Of course
> I/O is distributed across both spindles.  You would expect no less.  THAT is
> what I was telling you.

are you on crack?

it's a raid1.  writes go to all spindles.  they have to.  by definition.  
reads can be spread around, but writes are mirrored.

> 
> > the main worry i have is some user maliciously hardlinks everything
> > under /var/log somewhere else and slowly fills up the file system with
> > old rotated logs.  the users otherwise have quotas so they can't fill
> > things up on their own.  i could probably set up XFS quota trees (aka
> > "projects") but haven't gone to this effort yet.
> >   
> 
> See, this is where you show that you don't understand the system.  I'll
> explain it, just once.  /var/home contains  home directories.  /var/log and
> /var/home are on the same filesystem.  So /var/log/* can be linked to
> /var/home/malicious, and that's just one of your basic misunderstandings.

yes you are on crack.

i told you i understand this exactly.  it's right there in the message 
sent.

> No.  Look, you obviously haven't read what I've told you.  I mean, it's very
> obvious you haven't.  I'm wasting my time on you and I'm now out of
> generosity.  Good luck to you.  I think you need it.

you're the idiot not actually reading my messages.

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-29 Thread dean gaudet
On Sat, 29 Dec 2007, David Newall wrote:

> dean gaudet wrote:
> > On Wed, 19 Dec 2007, David Newall wrote:
> >   
> > > Mark Lord wrote:
> > > 
> > > > But.. pity there's no mount flag override for smaller systems,
> > > > where bind mounts might be more useful with link(2) actually working.
> > > >   
> > > I don't see it.  You always can make hard link on the underlying
> > > filesystem.
> > > If you need to make it on the bound mount, that is, if you can't locate
> > > the
> > > underlying filesystem to make the hard link, you can use a symbolic link.
> > > 
> > 
> > i run into it on a system where /home is a bind mount of /var/home ... i did
> > this because:
> > 
> > - i prefer /home to be nosuid,nodev (multi-user system)
> >   
> 
> Whatever security /home has, /var/home is the one that restricts because users
> can still access their files that way.

yep.  and /var is nosuid,nodev as well.

> > - i prefer /home to not be on same fs as /
> > - the system has only one raid1 array, and i can't stand having two
> > writable filesystems competing on the same set of spindles (i like to
> >   imagine that one fs competing for the spindles can potentially result
> >   in better seek patterns)
> > ...
> > - i didn't want to try to balance disk space between /var and /home
> > - i didn't want to use a volume mgr just to handle disk space balance...
> >   
> 
> Pffuff.  That's what volume managers are for!  You do have (at least) two
> independent spindles in your RAID1 array, which give you less need to worry
> about head-stack contention.

this system is write intensive and writes go to all spindles, so you're
assertion is wrong.  a quick look at iostat shows the system has averaged
50/50 reads/writes over 34 days.  that means 50% of the IO is going to
both spindles.

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   1.96 2.24   33.65   33.16   755.50   465.4536.55 
0.568.43   5.98  39.96

> You probably want different mount restrictions
> on /home than /var, so you really must use separate filesystems.

not sure why you think i want different restrictions... i'm running fine
with nosuid,nodev for /var.

the main worry i have is some user maliciously hardlinks everything
under /var/log somewhere else and slowly fills up the file system with
old rotated logs.  the users otherwise have quotas so they can't fill
things up on their own.  i could probably set up XFS quota trees (aka
"projects") but haven't gone to this effort yet.


> LVM is your friend.

i disagree.  but this is getting into personal taste -- i find volume
managers to be an unnecessary layer of complexity.  given i need quotas for
the users anyhow i don't see why i should both manage my disk space via
quotas and via an extra block layer.


> 
> But with regards to bind mounts and hard links:  If you want to be able to
> hard-link /home/me/log to /var/tmp/my-log, then I see nothing to prevent
> hard-linking /var/home/me/log to /var/tmp/my-log.

you probably missed the point where i said that i was surprised i couldn't
hardlink across the bind mount and actually wanted it to work.

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-29 Thread dean gaudet
On Sat, 29 Dec 2007, David Newall wrote:

 dean gaudet wrote:
  On Wed, 19 Dec 2007, David Newall wrote:

   Mark Lord wrote:
   
But.. pity there's no mount flag override for smaller systems,
where bind mounts might be more useful with link(2) actually working.
  
   I don't see it.  You always can make hard link on the underlying
   filesystem.
   If you need to make it on the bound mount, that is, if you can't locate
   the
   underlying filesystem to make the hard link, you can use a symbolic link.
   
  
  i run into it on a system where /home is a bind mount of /var/home ... i did
  this because:
  
  - i prefer /home to be nosuid,nodev (multi-user system)

 
 Whatever security /home has, /var/home is the one that restricts because users
 can still access their files that way.

yep.  and /var is nosuid,nodev as well.

  - i prefer /home to not be on same fs as /
  - the system has only one raid1 array, and i can't stand having two
  writable filesystems competing on the same set of spindles (i like to
imagine that one fs competing for the spindles can potentially result
in better seek patterns)
  ...
  - i didn't want to try to balance disk space between /var and /home
  - i didn't want to use a volume mgr just to handle disk space balance...

 
 Pffuff.  That's what volume managers are for!  You do have (at least) two
 independent spindles in your RAID1 array, which give you less need to worry
 about head-stack contention.

this system is write intensive and writes go to all spindles, so you're
assertion is wrong.  a quick look at iostat shows the system has averaged
50/50 reads/writes over 34 days.  that means 50% of the IO is going to
both spindles.

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   1.96 2.24   33.65   33.16   755.50   465.4536.55 
0.568.43   5.98  39.96

 You probably want different mount restrictions
 on /home than /var, so you really must use separate filesystems.

not sure why you think i want different restrictions... i'm running fine
with nosuid,nodev for /var.

the main worry i have is some user maliciously hardlinks everything
under /var/log somewhere else and slowly fills up the file system with
old rotated logs.  the users otherwise have quotas so they can't fill
things up on their own.  i could probably set up XFS quota trees (aka
projects) but haven't gone to this effort yet.


 LVM is your friend.

i disagree.  but this is getting into personal taste -- i find volume
managers to be an unnecessary layer of complexity.  given i need quotas for
the users anyhow i don't see why i should both manage my disk space via
quotas and via an extra block layer.


 
 But with regards to bind mounts and hard links:  If you want to be able to
 hard-link /home/me/log to /var/tmp/my-log, then I see nothing to prevent
 hard-linking /var/home/me/log to /var/tmp/my-log.

you probably missed the point where i said that i was surprised i couldn't
hardlink across the bind mount and actually wanted it to work.

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-29 Thread dean gaudet


On Sun, 30 Dec 2007, David Newall wrote:

 dean gaudet wrote:
   Pffuff.  That's what volume managers are for!  You do have (at least) two
   independent spindles in your RAID1 array, which give you less need to
   worry
   about head-stack contention.
   
  
  this system is write intensive and writes go to all spindles, so you're
  assertion is wrong.
 
 I don't know what you think I was asserting, but you were wrong.  Of course
 I/O is distributed across both spindles.  You would expect no less.  THAT is
 what I was telling you.

are you on crack?

it's a raid1.  writes go to all spindles.  they have to.  by definition.  
reads can be spread around, but writes are mirrored.

 
  the main worry i have is some user maliciously hardlinks everything
  under /var/log somewhere else and slowly fills up the file system with
  old rotated logs.  the users otherwise have quotas so they can't fill
  things up on their own.  i could probably set up XFS quota trees (aka
  projects) but haven't gone to this effort yet.

 
 See, this is where you show that you don't understand the system.  I'll
 explain it, just once.  /var/home contains  home directories.  /var/log and
 /var/home are on the same filesystem.  So /var/log/* can be linked to
 /var/home/malicious, and that's just one of your basic misunderstandings.

yes you are on crack.

i told you i understand this exactly.  it's right there in the message 
sent.

 No.  Look, you obviously haven't read what I've told you.  I mean, it's very
 obvious you haven't.  I'm wasting my time on you and I'm now out of
 generosity.  Good luck to you.  I think you need it.

you're the idiot not actually reading my messages.

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-29 Thread dean gaudet
On Sat, 29 Dec 2007, [EMAIL PROTECTED] wrote:

 On Sat, 29 Dec 2007 12:40:47 PST, dean gaudet said:
 
  the main worry i have is some user maliciously hardlinks everything
  under /var/log somewhere else and slowly fills up the file system with
  old rotated logs.
 
 Doctor, it hurts when I do this.. Well, don't do that then.

actually it doesn't hurt.  i have other mechanisms which would pick this 
up fairly quickly.

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-28 Thread dean gaudet
On Sat, 29 Dec 2007, Jan Engelhardt wrote:

> 
> On Dec 28 2007 18:53, dean gaudet wrote:
> >p.s. in retrospect i probably could have arranged it more like this:
> >
> >  mount /dev/md1 $tmpmntpoint
> >  mount --bind $tmpmntpoint/var /var
> >  mount --bind $tmpmntpoint/home /home
> >  umount $tmpmntpoint
> >
> >except i can't easily specify that in fstab... and neither of the bind 
> >mounts would show up in df(1).  seems like it wouldn't be hard to support 
> >this type of subtree mount though.  mount(8) could support a single 
> >subtree mount using this technique but the second subtree mount attempt 
> >would fail because you can't temporarily remount the device because the 
> >mount point is gone.
> 
> Why is it gone?
> 
> mount /dev/md1 /tmpmnt
> mount --bind /tmpmnt/var /var
> mount --bind /tmpmnt/home /home
> 
> Is perfectly fine, and /tmpmnt is still alive and mounted. Additionally,
> you can
> 
> umount /tmpmnt
> 
> now, which leaves only /var and /home.

i was trying to come up with a userland-only change in mount(8) which
would behave like so:

# mount --subtree var /dev/md1 /var
  internally mount does:
  - mount /dev/md1 /tmpmnt
  - mount --bind /tmpmnt/var /var
  - umount /tmpmnt

# mount --subtree home /dev/md1 /home
  internally mount does:
  - mount /dev/md1 /tmpmnt
  - mount --bind /tmpmnt/home /home
  - umount /tmpmnt

but that second mount would fail because /dev/md1 is already mounted
(but the mount point is gone)...

it certainly works if i issue the commands individually as i described
-- but a change within mount(8) would have the benefit of working with
/etc/fstab too.

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-28 Thread dean gaudet
On Wed, 19 Dec 2007, David Newall wrote:

> Mark Lord wrote:
> > But.. pity there's no mount flag override for smaller systems,
> > where bind mounts might be more useful with link(2) actually working.
> 
> I don't see it.  You always can make hard link on the underlying filesystem.
> If you need to make it on the bound mount, that is, if you can't locate the
> underlying filesystem to make the hard link, you can use a symbolic link.

i run into it on a system where /home is a bind mount of /var/home ... i 
did this because:

- i prefer /home to be nosuid,nodev (multi-user system)
- i prefer /home to not be on same fs as /
- the system has only one raid1 array, and i can't stand having two 
  writable filesystems competing on the same set of spindles (i like to
  imagine that one fs competing for the spindles can potentially result
  in better seek patterns)
- i didn't want to do /var -> /home/var or vice versa ... because i don't 
  like seeing "/var/home/dean" when i'm in my home dir and such.
- i didn't want to try to balance disk space between /var and /home
- i didn't want to use a volume mgr just to handle disk space balance...

so i gave a bind mount a try.

i was surprised to see that mv(1) between /var and /home causes the file 
to be copied due to the link(1) failing...

it does seem like something which should be configurable per mount 
point... maybe that can be done with the patches i've seen going around 
supporting per-bind mount read-only/etc options?

-dean

p.s. in retrospect i probably could have arranged it more like this:

  mount /dev/md1 $tmpmntpoint
  mount --bind $tmpmntpoint/var /var
  mount --bind $tmpmntpoint/home /home
  umount $tmpmntpoint

except i can't easily specify that in fstab... and neither of the bind 
mounts would show up in df(1).  seems like it wouldn't be hard to support 
this type of subtree mount though.  mount(8) could support a single 
subtree mount using this technique but the second subtree mount attempt 
would fail because you can't temporarily remount the device because the 
mount point is gone.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-28 Thread dean gaudet
On Wed, 19 Dec 2007, David Newall wrote:

 Mark Lord wrote:
  But.. pity there's no mount flag override for smaller systems,
  where bind mounts might be more useful with link(2) actually working.
 
 I don't see it.  You always can make hard link on the underlying filesystem.
 If you need to make it on the bound mount, that is, if you can't locate the
 underlying filesystem to make the hard link, you can use a symbolic link.

i run into it on a system where /home is a bind mount of /var/home ... i 
did this because:

- i prefer /home to be nosuid,nodev (multi-user system)
- i prefer /home to not be on same fs as /
- the system has only one raid1 array, and i can't stand having two 
  writable filesystems competing on the same set of spindles (i like to
  imagine that one fs competing for the spindles can potentially result
  in better seek patterns)
- i didn't want to do /var - /home/var or vice versa ... because i don't 
  like seeing /var/home/dean when i'm in my home dir and such.
- i didn't want to try to balance disk space between /var and /home
- i didn't want to use a volume mgr just to handle disk space balance...

so i gave a bind mount a try.

i was surprised to see that mv(1) between /var and /home causes the file 
to be copied due to the link(1) failing...

it does seem like something which should be configurable per mount 
point... maybe that can be done with the patches i've seen going around 
supporting per-bind mount read-only/etc options?

-dean

p.s. in retrospect i probably could have arranged it more like this:

  mount /dev/md1 $tmpmntpoint
  mount --bind $tmpmntpoint/var /var
  mount --bind $tmpmntpoint/home /home
  umount $tmpmntpoint

except i can't easily specify that in fstab... and neither of the bind 
mounts would show up in df(1).  seems like it wouldn't be hard to support 
this type of subtree mount though.  mount(8) could support a single 
subtree mount using this technique but the second subtree mount attempt 
would fail because you can't temporarily remount the device because the 
mount point is gone.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: permit link(2) to work across --bind mounts ?

2007-12-28 Thread dean gaudet
On Sat, 29 Dec 2007, Jan Engelhardt wrote:

 
 On Dec 28 2007 18:53, dean gaudet wrote:
 p.s. in retrospect i probably could have arranged it more like this:
 
   mount /dev/md1 $tmpmntpoint
   mount --bind $tmpmntpoint/var /var
   mount --bind $tmpmntpoint/home /home
   umount $tmpmntpoint
 
 except i can't easily specify that in fstab... and neither of the bind 
 mounts would show up in df(1).  seems like it wouldn't be hard to support 
 this type of subtree mount though.  mount(8) could support a single 
 subtree mount using this technique but the second subtree mount attempt 
 would fail because you can't temporarily remount the device because the 
 mount point is gone.
 
 Why is it gone?
 
 mount /dev/md1 /tmpmnt
 mount --bind /tmpmnt/var /var
 mount --bind /tmpmnt/home /home
 
 Is perfectly fine, and /tmpmnt is still alive and mounted. Additionally,
 you can
 
 umount /tmpmnt
 
 now, which leaves only /var and /home.

i was trying to come up with a userland-only change in mount(8) which
would behave like so:

# mount --subtree var /dev/md1 /var
  internally mount does:
  - mount /dev/md1 /tmpmnt
  - mount --bind /tmpmnt/var /var
  - umount /tmpmnt

# mount --subtree home /dev/md1 /home
  internally mount does:
  - mount /dev/md1 /tmpmnt
  - mount --bind /tmpmnt/home /home
  - umount /tmpmnt

but that second mount would fail because /dev/md1 is already mounted
(but the mount point is gone)...

it certainly works if i issue the commands individually as i described
-- but a change within mount(8) would have the benefit of working with
/etc/fstab too.

-dean
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread dean gaudet
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote:

> dean gaudet <[EMAIL PROTECTED]> writes:
> > on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
> > bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
> > boundary.
> 
> Worth noting though, is that atomic accesses that cross cache lines on
> an Opteron system is going to lock down the Hypertransport fabric for
> you during the operation -- which is obviously not so nice.

ooh awesome, i hadn't measured that before.

on a 2 node sockF / revF with a random pointer chase running on cpu 0 / 
node 0 i see the avg load-to-load cache miss latency jump from 77ns to 
109ns when i add an unaligned lock-intensive workload on one core of node 
1.  the worst i can get the pointer chase latency to is 273ns when i add 
two threads on node 1 fighting over an unaligned lock.

on a 4 node (square) the worst case i can get seems to be an increase from 
98ns with no antagonist to 385ns with 6 antagonists fighting over an 
unaligned lock on the other 3 nodes.

cool.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-26 Thread dean gaudet
On Fri, 23 Nov 2007, Arne Georg Gleditsch wrote:

 dean gaudet [EMAIL PROTECTED] writes:
  on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
  bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
  boundary.
 
 Worth noting though, is that atomic accesses that cross cache lines on
 an Opteron system is going to lock down the Hypertransport fabric for
 you during the operation -- which is obviously not so nice.

ooh awesome, i hadn't measured that before.

on a 2 node sockF / revF with a random pointer chase running on cpu 0 / 
node 0 i see the avg load-to-load cache miss latency jump from 77ns to 
109ns when i add an unaligned lock-intensive workload on one core of node 
1.  the worst i can get the pointer chase latency to is 273ns when i add 
two threads on node 1 fighting over an unaligned lock.

on a 4 node (square) the worst case i can get seems to be an increase from 
98ns with no antagonist to 385ns with 6 antagonists fighting over an 
unaligned lock on the other 3 nodes.

cool.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread dean gaudet
On Fri, 23 Nov 2007, Alan Cox wrote:

> Its usually faster if you don't misalign on x86 as well.

i'm not sure if i agree with "usually"... but i know you (alan) are 
probably aware of the exact requirements of the hw.

for everyone else:

on intel x86 processors an access is unaligned only if it crosses a 
cacheline boundary (64 bytes).  otherwise it's aligned.  the penalty for 
crossing a cacheline boundary varies from ~12 cycles (core2) to many 
dozens of cycles (p4).

on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
boundary.

if you're making <= 4 byte accesses i recommend not worrying about 
alignment on x86.  it's pretty hard to beat the hardware support.

i curse all the RISC and embedded processor designers who pretend 
unaligned accesses are something evil and to be avoided.  in case you're 
worried, MIPS patent 4,814,976 expired in december 2006 :)

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Documentation about unaligned memory access

2007-11-22 Thread dean gaudet
On Fri, 23 Nov 2007, Alan Cox wrote:

 Its usually faster if you don't misalign on x86 as well.

i'm not sure if i agree with usually... but i know you (alan) are 
probably aware of the exact requirements of the hw.

for everyone else:

on intel x86 processors an access is unaligned only if it crosses a 
cacheline boundary (64 bytes).  otherwise it's aligned.  the penalty for 
crossing a cacheline boundary varies from ~12 cycles (core2) to many 
dozens of cycles (p4).

on AMD x86 pre-family 10h the boundary is 8 bytes, and on fam 10h it's 16 
bytes.  the penalty is a mere 3 cycles if an access crosses the specified 
boundary.

if you're making = 4 byte accesses i recommend not worrying about 
alignment on x86.  it's pretty hard to beat the hardware support.

i curse all the RISC and embedded processor designers who pretend 
unaligned accesses are something evil and to be avoided.  in case you're 
worried, MIPS patent 4,814,976 expired in december 2006 :)

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)

2007-11-20 Thread dean gaudet
On Tue, 20 Nov 2007, dean gaudet wrote:

> On Tue, 20 Nov 2007, Metzger, Markus T wrote:
> 
> > +__cpuinit void ptrace_bts_init_intel(struct cpuinfo_x86 *c)
> > +{
> > +   switch (c->x86) {
> > +   case 0x6:
> > +   switch (c->x86_model) {
> > +#ifdef __i386__
> > +   case 0xD:
> > +   case 0xE: /* Pentium M */
> > +   ptrace_bts_ops = ptrace_bts_ops_pentium_m;
> > +   break;
> > +#endif /* _i386_ */
> > +   case 0xF: /* Core2 */
> > +   ptrace_bts_ops = ptrace_bts_ops_core2;
> > +   break;
> > +   default:
> > +   /* sorry, don't know about them */
> > +   break;
> > +   }
> > +   break;
> > +   case 0xF:
> > +   switch (c->x86_model) {
> > +#ifdef __i386__
> > +   case 0x0:
> > +   case 0x1:
> > +   case 0x2:
> > +   case 0x3: /* Netburst */
> > +   ptrace_bts_ops = ptrace_bts_ops_netburst;
> > +   break;
> > +#endif /* _i386_ */
> > +   default:
> > +   /* sorry, don't know about them */
> > +   break;
> > +   }
> > +   break;
> 
> is this right?  i thought intel family 15 models 3 and 4 supported amd64
> mode...

actually... why aren't you using cpuid level 1 edx bit 21 to 
enable/disable this feature?  isn't that the bit defined to indicate 
whether this feature is supported or not?

and it seems like this patch and perfmon2 are going to have to live with 
each other... since they both require the use of the DS save area...

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)

2007-11-20 Thread dean gaudet
On Tue, 20 Nov 2007, Metzger, Markus T wrote:

> +__cpuinit void ptrace_bts_init_intel(struct cpuinfo_x86 *c)
> +{
> + switch (c->x86) {
> + case 0x6:
> + switch (c->x86_model) {
> +#ifdef __i386__
> + case 0xD:
> + case 0xE: /* Pentium M */
> + ptrace_bts_ops = ptrace_bts_ops_pentium_m;
> + break;
> +#endif /* _i386_ */
> + case 0xF: /* Core2 */
> + ptrace_bts_ops = ptrace_bts_ops_core2;
> + break;
> + default:
> + /* sorry, don't know about them */
> + break;
> + }
> + break;
> + case 0xF:
> + switch (c->x86_model) {
> +#ifdef __i386__
> + case 0x0:
> + case 0x1:
> + case 0x2:
> + case 0x3: /* Netburst */
> + ptrace_bts_ops = ptrace_bts_ops_netburst;
> + break;
> +#endif /* _i386_ */
> + default:
> + /* sorry, don't know about them */
> + break;
> + }
> + break;

is this right?  i thought intel family 15 models 3 and 4 supported amd64
mode...

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/4] sys_indirect system call

2007-11-20 Thread dean gaudet
On Mon, 19 Nov 2007, Ingo Molnar wrote:

> 
> * Eric Dumazet <[EMAIL PROTECTED]> wrote:
> 
> > I do see a problem, because some readers will take your example as a 
> > reference, as it will probably sit in a page that 
> > google^Wsearch_engines will bring at the top of search results for 
> > next ten years or so.
> > 
> > (I bet for "sys_indirect syscall" -> http://lwn.net/Articles/258708/ )
> > 
> > Next time you post it, please warn users that it will break in some 
> > years, or state clearly this should only be used internally by glibc.
> 
> dont be silly, next time Ulrich should also warn everyone that running 
> attachments and applying patches from untrusted sources is dangerous?
> 
> any code that includes:
> 
>   fd = syscall (__NR_indirect, , , sizeof (i));
> 
> is by definition broken and unportable in every sense of the word. Apps 
> will use the proper glibc interfaces (if it's exposed).

as an application writer how do i access accept(2) with FD_CLOEXEC 
functionality?  will glibc expose an accept2() with a flags param?  if 
so... why don't we just have an accept2() syscall?

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 0/4] sys_indirect system call

2007-11-20 Thread dean gaudet
On Mon, 19 Nov 2007, Ingo Molnar wrote:

 
 * Eric Dumazet [EMAIL PROTECTED] wrote:
 
  I do see a problem, because some readers will take your example as a 
  reference, as it will probably sit in a page that 
  google^Wsearch_engines will bring at the top of search results for 
  next ten years or so.
  
  (I bet for sys_indirect syscall - http://lwn.net/Articles/258708/ )
  
  Next time you post it, please warn users that it will break in some 
  years, or state clearly this should only be used internally by glibc.
 
 dont be silly, next time Ulrich should also warn everyone that running 
 attachments and applying patches from untrusted sources is dangerous?
 
 any code that includes:
 
   fd = syscall (__NR_indirect, r, i, sizeof (i));
 
 is by definition broken and unportable in every sense of the word. Apps 
 will use the proper glibc interfaces (if it's exposed).

as an application writer how do i access accept(2) with FD_CLOEXEC 
functionality?  will glibc expose an accept2() with a flags param?  if 
so... why don't we just have an accept2() syscall?

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)

2007-11-20 Thread dean gaudet
On Tue, 20 Nov 2007, dean gaudet wrote:

 On Tue, 20 Nov 2007, Metzger, Markus T wrote:
 
  +__cpuinit void ptrace_bts_init_intel(struct cpuinfo_x86 *c)
  +{
  +   switch (c-x86) {
  +   case 0x6:
  +   switch (c-x86_model) {
  +#ifdef __i386__
  +   case 0xD:
  +   case 0xE: /* Pentium M */
  +   ptrace_bts_ops = ptrace_bts_ops_pentium_m;
  +   break;
  +#endif /* _i386_ */
  +   case 0xF: /* Core2 */
  +   ptrace_bts_ops = ptrace_bts_ops_core2;
  +   break;
  +   default:
  +   /* sorry, don't know about them */
  +   break;
  +   }
  +   break;
  +   case 0xF:
  +   switch (c-x86_model) {
  +#ifdef __i386__
  +   case 0x0:
  +   case 0x1:
  +   case 0x2:
  +   case 0x3: /* Netburst */
  +   ptrace_bts_ops = ptrace_bts_ops_netburst;
  +   break;
  +#endif /* _i386_ */
  +   default:
  +   /* sorry, don't know about them */
  +   break;
  +   }
  +   break;
 
 is this right?  i thought intel family 15 models 3 and 4 supported amd64
 mode...

actually... why aren't you using cpuid level 1 edx bit 21 to 
enable/disable this feature?  isn't that the bit defined to indicate 
whether this feature is supported or not?

and it seems like this patch and perfmon2 are going to have to live with 
each other... since they both require the use of the DS save area...

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)

2007-11-20 Thread dean gaudet
On Tue, 20 Nov 2007, Metzger, Markus T wrote:

 +__cpuinit void ptrace_bts_init_intel(struct cpuinfo_x86 *c)
 +{
 + switch (c-x86) {
 + case 0x6:
 + switch (c-x86_model) {
 +#ifdef __i386__
 + case 0xD:
 + case 0xE: /* Pentium M */
 + ptrace_bts_ops = ptrace_bts_ops_pentium_m;
 + break;
 +#endif /* _i386_ */
 + case 0xF: /* Core2 */
 + ptrace_bts_ops = ptrace_bts_ops_core2;
 + break;
 + default:
 + /* sorry, don't know about them */
 + break;
 + }
 + break;
 + case 0xF:
 + switch (c-x86_model) {
 +#ifdef __i386__
 + case 0x0:
 + case 0x1:
 + case 0x2:
 + case 0x3: /* Netburst */
 + ptrace_bts_ops = ptrace_bts_ops_netburst;
 + break;
 +#endif /* _i386_ */
 + default:
 + /* sorry, don't know about them */
 + break;
 + }
 + break;

is this right?  i thought intel family 15 models 3 and 4 supported amd64
mode...

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 4/4] first use of sys_indirect system call

2007-11-16 Thread dean gaudet
On Fri, 16 Nov 2007, Ulrich Drepper wrote:

> dean gaudet wrote:
> > honestly i think there should be a per-task flag which indicates whether 
> > fds are by default F_CLOEXEC or not.  my reason:  third party libraries.
> 
> Only somebody who thinks exclusively about applications as opposed to
> runtimes/libraries can say something like that.  Library writers don't
> have the luxury of being able to modify any global state.  This has all
> been discussed here before.

only someone who thinks about writing libraries can say something like 
that.  you've solved the problem for yourself, and for well written 
applications, but not for the other 99.% of libraries out there.

i'm not suggesting the library set the global flag.  i'm suggesting that 
me as an app writer will do so.

it seems like both methods are useful.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 4/4] first use of sys_indirect system call

2007-11-16 Thread dean gaudet
you know... i understand the need for FD_CLOEXEC -- in fact i tried 
petitioning for CLOEXEC options to all the fd creating syscalls something 
like 7 years ago when i was banging my head against the wall trying to 
figure out how to thread apache... but even still i'm not convinced that 
extending every system call which creates an fd is the way to do this.  
honestly i think there should be a per-task flag which indicates whether 
fds are by default F_CLOEXEC or not.  my reason:  third party libraries.

i can control all my own code in a threaded program, but i can't control 
all the code which is linked in.  fds are going to leak.

if i set a per task flag then the only thing which would break are third 
party libraries which use fork/exec and aren't aware they may need to 
unset F_CLOEXEC.  personally i'd rather break that than leak fds to 
another program.

but hey i'm happy to see this sort of thing is finally being fixed, 
thanks.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perfmon2 merge news

2007-11-16 Thread dean gaudet
On Fri, 16 Nov 2007, Andi Kleen wrote:

> I didn't see a clear list. 

- cross platform extensible API for configuring perf counters
- support for multiplexed counters
- support for virtualized 64-bit counters
- support for PC and call graph sampling at specific intervals
- support for reading counters not necessarily with sampling
- taskswitch support for counters
- API available from userland
- ability to self-monitor: need select/poll/etc interface
- support for PEBS, IBS and whatever other new perf monitoring 
  infrastructure the vendors through at us in the future
- low overhead:  must minimize the "probe effect" of monitoring
- low noise in measurements:  cannot achieve this in userland

permon2 has all of this and more i've probably neglected...

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 4/4] first use of sys_indirect system call

2007-11-16 Thread dean gaudet
On Fri, 16 Nov 2007, Ulrich Drepper wrote:

 dean gaudet wrote:
  honestly i think there should be a per-task flag which indicates whether 
  fds are by default F_CLOEXEC or not.  my reason:  third party libraries.
 
 Only somebody who thinks exclusively about applications as opposed to
 runtimes/libraries can say something like that.  Library writers don't
 have the luxury of being able to modify any global state.  This has all
 been discussed here before.

only someone who thinks about writing libraries can say something like 
that.  you've solved the problem for yourself, and for well written 
applications, but not for the other 99.% of libraries out there.

i'm not suggesting the library set the global flag.  i'm suggesting that 
me as an app writer will do so.

it seems like both methods are useful.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: perfmon2 merge news

2007-11-16 Thread dean gaudet
On Fri, 16 Nov 2007, Andi Kleen wrote:

 I didn't see a clear list. 

- cross platform extensible API for configuring perf counters
- support for multiplexed counters
- support for virtualized 64-bit counters
- support for PC and call graph sampling at specific intervals
- support for reading counters not necessarily with sampling
- taskswitch support for counters
- API available from userland
- ability to self-monitor: need select/poll/etc interface
- support for PEBS, IBS and whatever other new perf monitoring 
  infrastructure the vendors through at us in the future
- low overhead:  must minimize the probe effect of monitoring
- low noise in measurements:  cannot achieve this in userland

permon2 has all of this and more i've probably neglected...

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 4/4] first use of sys_indirect system call

2007-11-16 Thread dean gaudet
you know... i understand the need for FD_CLOEXEC -- in fact i tried 
petitioning for CLOEXEC options to all the fd creating syscalls something 
like 7 years ago when i was banging my head against the wall trying to 
figure out how to thread apache... but even still i'm not convinced that 
extending every system call which creates an fd is the way to do this.  
honestly i think there should be a per-task flag which indicates whether 
fds are by default F_CLOEXEC or not.  my reason:  third party libraries.

i can control all my own code in a threaded program, but i can't control 
all the code which is linked in.  fds are going to leak.

if i set a per task flag then the only thing which would break are third 
party libraries which use fork/exec and aren't aware they may need to 
unset F_CLOEXEC.  personally i'd rather break that than leak fds to 
another program.

but hey i'm happy to see this sort of thing is finally being fixed, 
thanks.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [perfmon] Re: [perfmon2] perfmon2 merge news

2007-11-14 Thread dean gaudet
On Thu, 15 Nov 2007, Paul Mackerras wrote:

> dean gaudet writes:
> 
> > actually multiplexing is the main feature i am in need of. there are an 
> > insufficient number of counters (even on k8 with 4 counters) to do 
> > complete stall accounting or to get a general overview of L1d/L1i/L2 cache 
> > hit rates, average miss latency, time spent in various stalls, and the 
> > memory system utilization (or HT bus utilization).  this runs out to 
> > something like 30 events which are interesting... and re-running a 
> > benchmark over and over just to get around the lack of multiplexing is a 
> > royal pain in the ass.
> 
> So by "multiplexing" do you mean the ability to have multiple event
> sets associated with a context and have the kernel switch between them
> automatically?

yep.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [perfmon] Re: [perfmon2] perfmon2 merge news

2007-11-14 Thread dean gaudet
On Wed, 14 Nov 2007, Andi Kleen wrote:

> Later a syscall might be needed with event multiplexing, but that seems
> more like a far away non essential feature.

actually multiplexing is the main feature i am in need of. there are an 
insufficient number of counters (even on k8 with 4 counters) to do 
complete stall accounting or to get a general overview of L1d/L1i/L2 cache 
hit rates, average miss latency, time spent in various stalls, and the 
memory system utilization (or HT bus utilization).  this runs out to 
something like 30 events which are interesting... and re-running a 
benchmark over and over just to get around the lack of multiplexing is a 
royal pain in the ass.

it's not a "far away non-essential feature" to me.  it's something i would 
use daily if i had all the pieces together now (and i'm constrained 
because i cannot add an out-of-tree patch which adds unofficial syscalls 
to the kernel i use).

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [perfmon] Re: [perfmon2] perfmon2 merge news

2007-11-14 Thread dean gaudet
On Wed, 14 Nov 2007, Andi Kleen wrote:

 Later a syscall might be needed with event multiplexing, but that seems
 more like a far away non essential feature.

actually multiplexing is the main feature i am in need of. there are an 
insufficient number of counters (even on k8 with 4 counters) to do 
complete stall accounting or to get a general overview of L1d/L1i/L2 cache 
hit rates, average miss latency, time spent in various stalls, and the 
memory system utilization (or HT bus utilization).  this runs out to 
something like 30 events which are interesting... and re-running a 
benchmark over and over just to get around the lack of multiplexing is a 
royal pain in the ass.

it's not a far away non-essential feature to me.  it's something i would 
use daily if i had all the pieces together now (and i'm constrained 
because i cannot add an out-of-tree patch which adds unofficial syscalls 
to the kernel i use).

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [perfmon] Re: [perfmon2] perfmon2 merge news

2007-11-14 Thread dean gaudet
On Thu, 15 Nov 2007, Paul Mackerras wrote:

 dean gaudet writes:
 
  actually multiplexing is the main feature i am in need of. there are an 
  insufficient number of counters (even on k8 with 4 counters) to do 
  complete stall accounting or to get a general overview of L1d/L1i/L2 cache 
  hit rates, average miss latency, time spent in various stalls, and the 
  memory system utilization (or HT bus utilization).  this runs out to 
  something like 30 events which are interesting... and re-running a 
  benchmark over and over just to get around the lack of multiplexing is a 
  royal pain in the ass.
 
 So by multiplexing do you mean the ability to have multiple event
 sets associated with a context and have the kernel switch between them
 automatically?

yep.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TCP_DEFER_ACCEPT issues

2007-11-04 Thread dean gaudet
fwiw i also brought the TCP_DEFER_ACCEPT problems up the end of last year:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg28916.html

it's possible the final message in that thread is how we should define the 
behaviour, i haven't tried the TCP_SYNCNT idea though.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TCP_DEFER_ACCEPT issues

2007-11-04 Thread dean gaudet
fwiw i also brought the TCP_DEFER_ACCEPT problems up the end of last year:

http://www.mail-archive.com/[EMAIL PROTECTED]/msg28916.html

it's possible the final message in that thread is how we should define the 
behaviour, i haven't tried the TCP_SYNCNT idea though.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread dean gaudet
On Sun, 21 Oct 2007, Jeremy Fitzhardinge wrote:

> dean gaudet wrote:
> > On Mon, 15 Oct 2007, Nick Piggin wrote:
> >
> >   
> >> Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
> >> because it generally has to invalidate TLBs on all CPUs.
> >> 
> >
> > why is that?  ignoring 32-bit archs we have heaps of address space 
> > available... couldn't the kernel just burn address space and delay global 
> > TLB invalidate by some relatively long time (say 1 second)?
> >   
> 
> Yes, that's precisely the problem.  xfs does delay the unmap, leaving
> stray mappings, which upsets Xen.

sounds like a bug in xen to me :)

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread dean gaudet
On Mon, 15 Oct 2007, Nick Piggin wrote:

> Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
> because it generally has to invalidate TLBs on all CPUs.

why is that?  ignoring 32-bit archs we have heaps of address space 
available... couldn't the kernel just burn address space and delay global 
TLB invalidate by some relatively long time (say 1 second)?

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread dean gaudet
On Mon, 15 Oct 2007, Nick Piggin wrote:

 Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
 because it generally has to invalidate TLBs on all CPUs.

why is that?  ignoring 32-bit archs we have heaps of address space 
available... couldn't the kernel just burn address space and delay global 
TLB invalidate by some relatively long time (say 1 second)?

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Interaction between Xen and XFS: stray RW mappings

2007-10-21 Thread dean gaudet
On Sun, 21 Oct 2007, Jeremy Fitzhardinge wrote:

 dean gaudet wrote:
  On Mon, 15 Oct 2007, Nick Piggin wrote:
 

  Yes, as Dave said, vmap (more specifically: vunmap) is very expensive
  because it generally has to invalidate TLBs on all CPUs.
  
 
  why is that?  ignoring 32-bit archs we have heaps of address space 
  available... couldn't the kernel just burn address space and delay global 
  TLB invalidate by some relatively long time (say 1 second)?

 
 Yes, that's precisely the problem.  xfs does delay the unmap, leaving
 stray mappings, which upsets Xen.

sounds like a bug in xen to me :)

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Memory Ordering White Paper

2007-09-08 Thread dean gaudet
On Sat, 8 Sep 2007, Petr Vandrovec wrote:

> dean gaudet wrote:
> > On Sun, 9 Sep 2007, Nick Piggin wrote:
> > 
> > > I've also heard that string operations do not follow the normal ordering,
> > > but
> > > that's just with respect to individual loads/stores in the one operation,
> > > I
> > > hope? And they will still follow ordering rules WRT surrounding loads and
> > > stores?
> > 
> > see section 7.2.3 of intel volume 3A...
> > 
> > "Code dependent upon sequential store ordering should not use the string
> > operations for the entire data structure to be stored. Data and semaphores
> > should be separated. Order dependent code should use a discrete semaphore
> > uniquely stored to after any string operations to allow correctly ordered
> > data to be seen by all processors."
> > 
> > i think we need sfence after things like copy_page, clear_page, and possibly
> > copy_user... at least on intel processors with fast strings option enabled.
> 
> I do not think.  I believe that authors are trying to say that
> 
> struct { uint8 lock; uint8 data; } x;
> 
> lea (x.data),%edi
> mov $2,%ecx
> std
> rep movsb
> 
> to set both data and lock does not guarantee that x.lock will be set after
> x.data and that you should do
> 
> lea (x.data),%edi
> std
> movsb
> movsb  # or mov (%esi),%al; mov %al,(%edi), but movsb looks discrete enough to
> me
> 
> instead (and yes, I know that my example is silly).

no it's worse than that -- intel fast string stores can become globally 
visible in any order at all w.r.t. normal loads or stores... so take all 
those great examples in their recent whitepaper and throw out all the 
ordering guarantees for addresses on different cachelines if any of the 
stores are rep string.

for example transitive store ordering for locations on multiple cachelines 
is not guaranteed at all.  the kernel could return a zero page and one 
core could see the zeroes out of order with another core performing some 
sort of lockless data structure operation.

fast strings don't break ordering from the point of view of the core 
performing the rep string operation, but externally there are no 
guarantees (it's right there in the docs).

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Memory Ordering White Paper

2007-09-08 Thread dean gaudet
On Sun, 9 Sep 2007, Nick Piggin wrote:

> I've also heard that string operations do not follow the normal ordering, but
> that's just with respect to individual loads/stores in the one operation, I
> hope? And they will still follow ordering rules WRT surrounding loads and
> stores?

see section 7.2.3 of intel volume 3A...

"Code dependent upon sequential store ordering should not use the string 
operations for the entire data structure to be stored. Data and semaphores 
should be separated. Order dependent code should use a discrete semaphore 
uniquely stored to after any string operations to allow correctly ordered 
data to be seen by all processors."

i think we need sfence after things like copy_page, clear_page, and 
possibly copy_user... at least on intel processors with fast strings 
option enabled.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Memory Ordering White Paper

2007-09-08 Thread dean gaudet
On Sun, 9 Sep 2007, Nick Piggin wrote:

 I've also heard that string operations do not follow the normal ordering, but
 that's just with respect to individual loads/stores in the one operation, I
 hope? And they will still follow ordering rules WRT surrounding loads and
 stores?

see section 7.2.3 of intel volume 3A...

Code dependent upon sequential store ordering should not use the string 
operations for the entire data structure to be stored. Data and semaphores 
should be separated. Order dependent code should use a discrete semaphore 
uniquely stored to after any string operations to allow correctly ordered 
data to be seen by all processors.

i think we need sfence after things like copy_page, clear_page, and 
possibly copy_user... at least on intel processors with fast strings 
option enabled.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Intel Memory Ordering White Paper

2007-09-08 Thread dean gaudet
On Sat, 8 Sep 2007, Petr Vandrovec wrote:

 dean gaudet wrote:
  On Sun, 9 Sep 2007, Nick Piggin wrote:
  
   I've also heard that string operations do not follow the normal ordering,
   but
   that's just with respect to individual loads/stores in the one operation,
   I
   hope? And they will still follow ordering rules WRT surrounding loads and
   stores?
  
  see section 7.2.3 of intel volume 3A...
  
  Code dependent upon sequential store ordering should not use the string
  operations for the entire data structure to be stored. Data and semaphores
  should be separated. Order dependent code should use a discrete semaphore
  uniquely stored to after any string operations to allow correctly ordered
  data to be seen by all processors.
  
  i think we need sfence after things like copy_page, clear_page, and possibly
  copy_user... at least on intel processors with fast strings option enabled.
 
 I do not think.  I believe that authors are trying to say that
 
 struct { uint8 lock; uint8 data; } x;
 
 lea (x.data),%edi
 mov $2,%ecx
 std
 rep movsb
 
 to set both data and lock does not guarantee that x.lock will be set after
 x.data and that you should do
 
 lea (x.data),%edi
 std
 movsb
 movsb  # or mov (%esi),%al; mov %al,(%edi), but movsb looks discrete enough to
 me
 
 instead (and yes, I know that my example is silly).

no it's worse than that -- intel fast string stores can become globally 
visible in any order at all w.r.t. normal loads or stores... so take all 
those great examples in their recent whitepaper and throw out all the 
ordering guarantees for addresses on different cachelines if any of the 
stores are rep string.

for example transitive store ordering for locations on multiple cachelines 
is not guaranteed at all.  the kernel could return a zero page and one 
core could see the zeroes out of order with another core performing some 
sort of lockless data structure operation.

fast strings don't break ordering from the point of view of the core 
performing the rep string operation, but externally there are no 
guarantees (it's right there in the docs).

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] x86: Set PCI config space size to extended for AMD Barcelona

2007-09-03 Thread dean gaudet
it's so very unfortunate the PCI standard has no feature bit to indicate 
the presence of ECS.

FWIW in my testing on a range of machines spanning 7 or 8 years i could 
read config space reg 256... and get 0x when the device didn't 
support ECS, and get valid data when the device did support ECS... granted 
there may be some system out there which behaves really badly when you do 
this.

perhaps someone could write a userspace program and test that concept on a 
far wider range of machines.

-dean

On Mon, 3 Sep 2007, Robert Richter wrote:

> This patch sets the config space size for AMD Barcelona PCI devices to
> 4096.
> 
> Signed-off-by: Robert Richter <[EMAIL PROTECTED]>
> 
> ---
>  arch/i386/pci/fixup.c |   14 ++
>  1 file changed, 14 insertions(+)
> 
> Index: linux-2.6/arch/i386/pci/fixup.c
> ===
> --- linux-2.6.orig/arch/i386/pci/fixup.c
> +++ linux-2.6/arch/i386/pci/fixup.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include "pci.h"
>  
> +#define PCI_CFG_SPACE_EXP_SIZE   4096
>  
>  static void __devinit pci_fixup_i450nx(struct pci_dev *d)
>  {
> @@ -444,3 +445,16 @@ static void __devinit pci_siemens_interr
>  }
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SIEMENS, 0x0015,
> pci_siemens_interrupt_controller);
> +
> +/*
> + * Extend size of PCI configuration space for AMD CPUs
> + */
> +static void __devinit pci_ext_cfg_space_access(struct pci_dev *dev)
> +{
> + dev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_HT,   
> pci_ext_cfg_space_access);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_MAP,  
> pci_ext_cfg_space_access);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_DRAM, 
> pci_ext_cfg_space_access);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_MISC, 
> pci_ext_cfg_space_access);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_LINK, 
> pci_ext_cfg_space_access);
> 
> -- 
> AMD Saxony, Dresden, Germany
> Operating System Research Center
> email: [EMAIL PROTECTED]
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 5/5] x86: Set PCI config space size to extended for AMD Barcelona

2007-09-03 Thread dean gaudet
it's so very unfortunate the PCI standard has no feature bit to indicate 
the presence of ECS.

FWIW in my testing on a range of machines spanning 7 or 8 years i could 
read config space reg 256... and get 0x when the device didn't 
support ECS, and get valid data when the device did support ECS... granted 
there may be some system out there which behaves really badly when you do 
this.

perhaps someone could write a userspace program and test that concept on a 
far wider range of machines.

-dean

On Mon, 3 Sep 2007, Robert Richter wrote:

 This patch sets the config space size for AMD Barcelona PCI devices to
 4096.
 
 Signed-off-by: Robert Richter [EMAIL PROTECTED]
 
 ---
  arch/i386/pci/fixup.c |   14 ++
  1 file changed, 14 insertions(+)
 
 Index: linux-2.6/arch/i386/pci/fixup.c
 ===
 --- linux-2.6.orig/arch/i386/pci/fixup.c
 +++ linux-2.6/arch/i386/pci/fixup.c
 @@ -8,6 +8,7 @@
  #include linux/init.h
  #include pci.h
  
 +#define PCI_CFG_SPACE_EXP_SIZE   4096
  
  static void __devinit pci_fixup_i450nx(struct pci_dev *d)
  {
 @@ -444,3 +445,16 @@ static void __devinit pci_siemens_interr
  }
  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SIEMENS, 0x0015,
 pci_siemens_interrupt_controller);
 +
 +/*
 + * Extend size of PCI configuration space for AMD CPUs
 + */
 +static void __devinit pci_ext_cfg_space_access(struct pci_dev *dev)
 +{
 + dev-cfg_size = PCI_CFG_SPACE_EXP_SIZE;
 +}
 +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_HT,   
 pci_ext_cfg_space_access);
 +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_MAP,  
 pci_ext_cfg_space_access);
 +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_DRAM, 
 pci_ext_cfg_space_access);
 +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_MISC, 
 pci_ext_cfg_space_access);
 +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_FAM10H_LINK, 
 pci_ext_cfg_space_access);
 
 -- 
 AMD Saxony, Dresden, Germany
 Operating System Research Center
 email: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [1/12] x86: Work around mmio config space quirk on AMD Fam10h

2007-08-12 Thread dean gaudet
On Sun, 12 Aug 2007, Linus Torvalds wrote:

> On Sun, 12 Aug 2007, Dave Jones wrote:
> > 
> > This does make me wonder, why these weren't caught in -mm ?
> 
> I'm worried that -mm isn't getting a lot of exposure these days. People do 
> run it, but I wonder how many..

andrew caught it in -mm and reverted it.  it crashed his vaio.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [1/12] x86: Work around mmio config space quirk on AMD Fam10h

2007-08-12 Thread dean gaudet
On Sun, 12 Aug 2007, Linus Torvalds wrote:

 On Sun, 12 Aug 2007, Dave Jones wrote:
  
  This does make me wonder, why these weren't caught in -mm ?
 
 I'm worried that -mm isn't getting a lot of exposure these days. People do 
 run it, but I wonder how many..

andrew caught it in -mm and reverted it.  it crashed his vaio.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TLB sizes among x86 CPUs?

2007-07-31 Thread dean gaudet
http://sandpile.org/

On Wed, 18 Jul 2007, Rene Herman wrote:

> Good day.
> 
> Would anyone happen to have a list of TLB sizes for some selected x86{,-64}
> CPUs? I know it goes from a few entries on a 386 to a lot on Opteron but I
> have a real hard time finding specific data.
> 
> Rene.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: TLB sizes among x86 CPUs?

2007-07-31 Thread dean gaudet
http://sandpile.org/

On Wed, 18 Jul 2007, Rene Herman wrote:

 Good day.
 
 Would anyone happen to have a list of TLB sizes for some selected x86{,-64}
 CPUs? I know it goes from a few entries on a 386 to a lot on Opteron but I
 have a real hard time finding specific data.
 
 Rene.
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-30 Thread dean gaudet
On Thu, 19 Jul 2007, Bill Irwin wrote:

> On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote:
> > But I do think a second reason to do this is to make hugetlbfs behave
> > like a normal fs -- that is read(), write(), etc. work on files in the
> > mountpoint. But that is simply my opinion.
> 
> Mine as well.

ditto.  here's a few other things i've run into recently:

it should be possible to use cp(1) to load large datasets into a 
hugetlbfs.

it should be possible to use ftruncate() on hugetlbfs files.  (on a tmpfs 
it's req'd to extend the file before mmaping... on hugetlbfs it returns 
EINVAL or somesuch and mmap just magically extends files.)

it should be possible to statfs() and get usage info... this works only if 
you mount with size=N.

-dean




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hugetlbfs read() support

2007-07-30 Thread dean gaudet
On Thu, 19 Jul 2007, Bill Irwin wrote:

 On Thu, Jul 19, 2007 at 10:07:59AM -0700, Nishanth Aravamudan wrote:
  But I do think a second reason to do this is to make hugetlbfs behave
  like a normal fs -- that is read(), write(), etc. work on files in the
  mountpoint. But that is simply my opinion.
 
 Mine as well.

ditto.  here's a few other things i've run into recently:

it should be possible to use cp(1) to load large datasets into a 
hugetlbfs.

it should be possible to use ftruncate() on hugetlbfs files.  (on a tmpfs 
it's req'd to extend the file before mmaping... on hugetlbfs it returns 
EINVAL or somesuch and mmap just magically extends files.)

it should be possible to statfs() and get usage info... this works only if 
you mount with size=N.

-dean




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT][PATCH v7] sata_mv: convert to new EH

2007-07-18 Thread dean gaudet
On Wed, 18 Jul 2007, Pasi Kärkkäinen wrote:

> What brand/model your sata_mv controller is? Would be nice to know to be
> able to get a "known-to-work" one.. 

http://supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm

-dean

Re: [RFT][PATCH v7] sata_mv: convert to new EH

2007-07-18 Thread dean gaudet
On Wed, 18 Jul 2007, Pasi Kärkkäinen wrote:

 What brand/model your sata_mv controller is? Would be nice to know to be
 able to get a known-to-work one.. 

http://supermicro.com/products/accessories/addon/AoC-SAT2-MV8.cfm

-dean

Re: [RFT][PATCH v7] sata_mv: convert to new EH

2007-07-12 Thread dean gaudet
On Thu, 12 Jul 2007, Jeff Garzik wrote:

> dean gaudet wrote:
> > On Thu, 12 Jul 2007, Jeff Garzik wrote:
> > 
> > > dean gaudet wrote:
> > > > oh very nice... no warnings on boot, and no warnings while i "dd
> > > > if=/dev/sdX
> > > > of=/dev/null" and i'm seeing 74MB/s+ from each disk on this simple read
> > > > test.
> > > > 
> > > > for lack of a better test i started an untar/diff stress test on the
> > > > disks... we'll see how it goes.  (it's based on doug ledford's
> > > > memtest.sh)
> > > Thanks for the testing.  Looks like we might have hit on something good...
> > 
> > yep this does look good.  no problems overnight in the untar/diff/rm
> > workload.  if you've got any other workload you'd like me to throw at it,
> > let me know.  i might be able to scare up a disk or two with errors to check
> > error handling.
> 
> Nothing specific.  I usually just throw various workloads at it, both
> throughput-intensive, seek-intensive, multiple threads at the same time,
> stressing multiple disks at the same time, etc.

yeah the untar/diff/rm workload is seek/thread intensive.


> I presume from your past messages your tests include multiple disks at the
> same time?

yep, 4 disks... i was getting 4x74MB/s with dd read.  unfortunately i 
don't have more disks in the system at this point so i can't test all 8 
ports at full tilt.

each disk had its own XFS filesystem.

> > i tested hotplug just for kicks... no luck there :)  but then you didn't say
> > that would work yet.
> 
> hehehe Well I sorta didn't want to mention it, to avoid clouding the waters
> further.
> 
> In theory, hotplug and hot unplug -should- work, in version 7.  Your report is
> a useful contradiction of that theory, and signals where to poke next.  Since
> all this hacking is a spare-time effort, so promises as to when next I'll poke
> at it.   It might be tomorrow, or a month from now.  Getting "new EH" upstream
> was a big hurdle to overcome, and your testing really helped that along.
> 
> Anyway, something like Version 7 is probably what I will push upstream for
> 2.6.23-rc1.

cool, thanks.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT][PATCH v7] sata_mv: convert to new EH

2007-07-12 Thread dean gaudet
On Thu, 12 Jul 2007, Jeff Garzik wrote:

> dean gaudet wrote:
> > oh very nice... no warnings on boot, and no warnings while i "dd if=/dev/sdX
> > of=/dev/null" and i'm seeing 74MB/s+ from each disk on this simple read
> > test.
> > 
> > for lack of a better test i started an untar/diff stress test on the
> > disks... we'll see how it goes.  (it's based on doug ledford's memtest.sh)
> 
> Thanks for the testing.  Looks like we might have hit on something good...

yep this does look good.  no problems overnight in the untar/diff/rm 
workload.  if you've got any other workload you'd like me to throw at it, 
let me know.  i might be able to scare up a disk or two with errors to 
check error handling.

i tested hotplug just for kicks... no luck there :)  but then you didn't 
say that would work yet.

thanks!
-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT][PATCH v7] sata_mv: convert to new EH

2007-07-12 Thread dean gaudet
On Wed, 11 Jul 2007, Jeff Garzik wrote:

> As before, this patch is against 2.6.22 with no other patches needed nor
> applied.
> 
> In this revision, interrupt handling was improved quite a bit,
> particularly for EDMA.  The WARNING in mv_get_crpb_status() goes away,
> because that routine went away.  Its EDMA handling was potentially racy
> as well.  It was replaced with a loop in mv_intr_edma() that guarantees
> it always clears responses out of the queue, not a single response.
> 
> Here's hoping that the WARNING in mv_qc_issue() goes away as well, but I
> am less than 50% confident that will happen.
> 
> The driver is making substantial progress with all these improvements,
> though, in searching for the cause of this hardware behavior :)
> 
> Though if mv_qc_issue() still warns, I would be interested to know if
> this driver works OK if the mv_qc_issue() warning is simply removed at
> that point...

oh very nice... no warnings on boot, and no warnings while i "dd 
if=/dev/sdX of=/dev/null" and i'm seeing 74MB/s+ from each disk on this 
simple read test.

for lack of a better test i started an untar/diff stress test on the 
disks... we'll see how it goes.  (it's based on doug ledford's 
memtest.sh)

thanks!
-dean

sata_mv_v7_boot.txt.gz
Description: Binary data


Re: [RFT][PATCH v7] sata_mv: convert to new EH

2007-07-12 Thread dean gaudet
On Wed, 11 Jul 2007, Jeff Garzik wrote:

 As before, this patch is against 2.6.22 with no other patches needed nor
 applied.
 
 In this revision, interrupt handling was improved quite a bit,
 particularly for EDMA.  The WARNING in mv_get_crpb_status() goes away,
 because that routine went away.  Its EDMA handling was potentially racy
 as well.  It was replaced with a loop in mv_intr_edma() that guarantees
 it always clears responses out of the queue, not a single response.
 
 Here's hoping that the WARNING in mv_qc_issue() goes away as well, but I
 am less than 50% confident that will happen.
 
 The driver is making substantial progress with all these improvements,
 though, in searching for the cause of this hardware behavior :)
 
 Though if mv_qc_issue() still warns, I would be interested to know if
 this driver works OK if the mv_qc_issue() warning is simply removed at
 that point...

oh very nice... no warnings on boot, and no warnings while i dd 
if=/dev/sdX of=/dev/null and i'm seeing 74MB/s+ from each disk on this 
simple read test.

for lack of a better test i started an untar/diff stress test on the 
disks... we'll see how it goes.  (it's based on doug ledford's 
memtest.sh)

thanks!
-dean

sata_mv_v7_boot.txt.gz
Description: Binary data


Re: [RFT][PATCH v7] sata_mv: convert to new EH

2007-07-12 Thread dean gaudet
On Thu, 12 Jul 2007, Jeff Garzik wrote:

 dean gaudet wrote:
  oh very nice... no warnings on boot, and no warnings while i dd if=/dev/sdX
  of=/dev/null and i'm seeing 74MB/s+ from each disk on this simple read
  test.
  
  for lack of a better test i started an untar/diff stress test on the
  disks... we'll see how it goes.  (it's based on doug ledford's memtest.sh)
 
 Thanks for the testing.  Looks like we might have hit on something good...

yep this does look good.  no problems overnight in the untar/diff/rm 
workload.  if you've got any other workload you'd like me to throw at it, 
let me know.  i might be able to scare up a disk or two with errors to 
check error handling.

i tested hotplug just for kicks... no luck there :)  but then you didn't 
say that would work yet.

thanks!
-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT][PATCH v7] sata_mv: convert to new EH

2007-07-12 Thread dean gaudet
On Thu, 12 Jul 2007, Jeff Garzik wrote:

 dean gaudet wrote:
  On Thu, 12 Jul 2007, Jeff Garzik wrote:
  
   dean gaudet wrote:
oh very nice... no warnings on boot, and no warnings while i dd
if=/dev/sdX
of=/dev/null and i'm seeing 74MB/s+ from each disk on this simple read
test.

for lack of a better test i started an untar/diff stress test on the
disks... we'll see how it goes.  (it's based on doug ledford's
memtest.sh)
   Thanks for the testing.  Looks like we might have hit on something good...
  
  yep this does look good.  no problems overnight in the untar/diff/rm
  workload.  if you've got any other workload you'd like me to throw at it,
  let me know.  i might be able to scare up a disk or two with errors to check
  error handling.
 
 Nothing specific.  I usually just throw various workloads at it, both
 throughput-intensive, seek-intensive, multiple threads at the same time,
 stressing multiple disks at the same time, etc.

yeah the untar/diff/rm workload is seek/thread intensive.


 I presume from your past messages your tests include multiple disks at the
 same time?

yep, 4 disks... i was getting 4x74MB/s with dd read.  unfortunately i 
don't have more disks in the system at this point so i can't test all 8 
ports at full tilt.

each disk had its own XFS filesystem.

  i tested hotplug just for kicks... no luck there :)  but then you didn't say
  that would work yet.
 
 hehehe Well I sorta didn't want to mention it, to avoid clouding the waters
 further.
 
 In theory, hotplug and hot unplug -should- work, in version 7.  Your report is
 a useful contradiction of that theory, and signals where to poke next.  Since
 all this hacking is a spare-time effort, so promises as to when next I'll poke
 at it.   It might be tomorrow, or a month from now.  Getting new EH upstream
 was a big hurdle to overcome, and your testing really helped that along.
 
 Anyway, something like Version 7 is probably what I will push upstream for
 2.6.23-rc1.

cool, thanks.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT][PATCH 2/2] sata_mv: convert to new EH (v5)

2007-07-10 Thread dean gaudet
On Mon, 9 Jul 2007, Jeff Garzik wrote:

> 
> This is the latest update of the sata_mv conversion to new EH.  I'm
> looking for testers, of two configurations:
> 
>   2.6.22 + patch #1   (baseline)
>   2.6.22 + patch #1 + this patch  (sata_mv new EH)
> 
> This patch contains a small but key race fix in the interrupt handler,
> which should improve things over version 4 of this patch (posted
> 05/26/2007).

it does seem an improvement... but still not quite 100%.  now my system 
finishes booting but there's still lots of sata_mv warnings, and a simple 
dd from the disks on the sata_mv maxes out around 2MB/s.  i haven't tried 
anything more thorough.

dmesg attached... sorry it's a bit truncated, and pls ignore the MM0 
complaints, i've apparently got a bad umem nvram card in there.  it 
appears to contain the first sata_mv warnings so i think it's useful.

-dean

dmesg.txt.gz
Description: Binary data


Re: [RFT][PATCH 2/2] sata_mv: convert to new EH (v5)

2007-07-10 Thread dean gaudet
On Mon, 9 Jul 2007, Jeff Garzik wrote:

 
 This is the latest update of the sata_mv conversion to new EH.  I'm
 looking for testers, of two configurations:
 
   2.6.22 + patch #1   (baseline)
   2.6.22 + patch #1 + this patch  (sata_mv new EH)
 
 This patch contains a small but key race fix in the interrupt handler,
 which should improve things over version 4 of this patch (posted
 05/26/2007).

it does seem an improvement... but still not quite 100%.  now my system 
finishes booting but there's still lots of sata_mv warnings, and a simple 
dd from the disks on the sata_mv maxes out around 2MB/s.  i haven't tried 
anything more thorough.

dmesg attached... sorry it's a bit truncated, and pls ignore the MM0 
complaints, i've apparently got a bad umem nvram card in there.  it 
appears to contain the first sata_mv warnings so i think it's useful.

-dean

dmesg.txt.gz
Description: Binary data


Re: limits on raid

2007-06-17 Thread dean gaudet
On Sun, 17 Jun 2007, Wakko Warner wrote:

> What benefit would I gain by using an external journel and how big would it
> need to be?

i don't know how big the journal needs to be... i'm limited by xfs'
maximum journal size of 128MiB.

i don't have much benchmark data -- but here are some rough notes i took
when i was evaluating a umem NVRAM card.  since the pata disks in the
raid1 have write caching enabled it's somewhat of an unfair comparison,
but the important info is the 88 seconds for internal journal vs. 81
seconds for external journal.

-dean

time sh -c 'tar xf /var/tmp/linux-2.6.20.tar; sync'

xfs journal raid5 bitmaptimes
internalnone0.18s user 2.14s system 2% cpu 1:27.95 total
internalinternal0.16s user 2.16s system 1% cpu 2:01.12 total
raid1   none0.07s user 2.02s system 2% cpu 1:20.62 total
raid1   internal0.14s user 2.01s system 1% cpu 1:55.18 total
raid1   raid1   0.14s user 2.03s system 2% cpu 1:20.61 total
umemnone0.13s user 2.07s system 2% cpu 1:20.77 total
umeminternal0.15s user 2.16s system 2% cpu 1:51.28 total
umemumem0.12s user 2.13s system 2% cpu 1:20.50 total


raid5:
- 4x seagate 7200.10 400GB on marvell MV88SX6081
- mdadm --create --level=5 --raid-devices=4 /dev/md4 /dev/sd[abcd]1

raid1:
- 2x maxtor 6Y200P0 on 3ware 7504
- two 128MiB partitions starting at cyl 1
- mdadm --create --level=1 --raid-disks=2 --auto=yes --assume-clean /dev/md1 
/dev/sd[fg]1
- mdadm --create --level=1 --raid-disks=2 --auto=yes --assume-clean /dev/md2 
/dev/sd[fg]2
- md1 is used for external xfs journal
- md2 has an ext3 filesystem for the external md4 bitmap

xfs:
- mkfs.xfs issued before each run using the defaults (aside from -l 
logdev=/dev/md1)
- mount -o noatime,nodiratime[,logdev=/dev/md1] 

umem:
- 512MiB Micro Memory MM-5415CN
- 2 partitions similar to the raid1 setup
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-17 Thread dean gaudet
On Sun, 17 Jun 2007, Wakko Warner wrote:

> dean gaudet wrote:
> > On Sat, 16 Jun 2007, Wakko Warner wrote:
> > 
> > > When I've had an unclean shutdown on one of my systems (10x 50gb raid5) 
> > > it's
> > > always slowed the system down when booting up.  Quite significantly I must
> > > say.  I wait until I can login and change the rebuild max speed to slow it
> > > down while I'm using it.   But that is another thing.
> > 
> > i use an external write-intent bitmap on a raid1 to avoid this... you 
> > could use internal bitmap but that slows down i/o too much for my tastes.  
> > i also use an external xfs journal for the same reason.  2 disk raid1 for 
> > root/journal/bitmap, N disk raid5 for bulk storage.  no spindles in 
> > common.
> 
> I must remember this if I have to rebuild the array.  Although I'm
> considering moving to a hardware raid solution when I upgrade my storage.

you can do it without a rebuild -- that's in fact how i did it the first 
time.

to add an external bitmap:

mdadm --grow --bitmap /bitmapfile /dev/mdX

plus add "bitmap=/bitmapfile" to mdadm.conf... as in:

ARRAY /dev/md4 bitmap=/bitmap.md4 UUID=dbc3be0b:b5853930:a02e038c:13ba8cdc

you can also easily move an ext3 journal to an external journal with 
tune2fs (see man page).

if you use XFS it's a bit more of a challenge to convert from internal to 
external, but see this thread:

http://marc.theaimsgroup.com/?l=linux-xfs=106929781232520=2

i found that i had to do "sb 1", "sb 2", ..., "sb N" for all sb rather 
than just the "sb 0" that email instructed me to do.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-17 Thread dean gaudet
On Sun, 17 Jun 2007, Wakko Warner wrote:

 dean gaudet wrote:
  On Sat, 16 Jun 2007, Wakko Warner wrote:
  
   When I've had an unclean shutdown on one of my systems (10x 50gb raid5) 
   it's
   always slowed the system down when booting up.  Quite significantly I must
   say.  I wait until I can login and change the rebuild max speed to slow it
   down while I'm using it.   But that is another thing.
  
  i use an external write-intent bitmap on a raid1 to avoid this... you 
  could use internal bitmap but that slows down i/o too much for my tastes.  
  i also use an external xfs journal for the same reason.  2 disk raid1 for 
  root/journal/bitmap, N disk raid5 for bulk storage.  no spindles in 
  common.
 
 I must remember this if I have to rebuild the array.  Although I'm
 considering moving to a hardware raid solution when I upgrade my storage.

you can do it without a rebuild -- that's in fact how i did it the first 
time.

to add an external bitmap:

mdadm --grow --bitmap /bitmapfile /dev/mdX

plus add bitmap=/bitmapfile to mdadm.conf... as in:

ARRAY /dev/md4 bitmap=/bitmap.md4 UUID=dbc3be0b:b5853930:a02e038c:13ba8cdc

you can also easily move an ext3 journal to an external journal with 
tune2fs (see man page).

if you use XFS it's a bit more of a challenge to convert from internal to 
external, but see this thread:

http://marc.theaimsgroup.com/?l=linux-xfsm=106929781232520w=2

i found that i had to do sb 1, sb 2, ..., sb N for all sb rather 
than just the sb 0 that email instructed me to do.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-17 Thread dean gaudet
On Sun, 17 Jun 2007, Wakko Warner wrote:

 What benefit would I gain by using an external journel and how big would it
 need to be?

i don't know how big the journal needs to be... i'm limited by xfs'
maximum journal size of 128MiB.

i don't have much benchmark data -- but here are some rough notes i took
when i was evaluating a umem NVRAM card.  since the pata disks in the
raid1 have write caching enabled it's somewhat of an unfair comparison,
but the important info is the 88 seconds for internal journal vs. 81
seconds for external journal.

-dean

time sh -c 'tar xf /var/tmp/linux-2.6.20.tar; sync'

xfs journal raid5 bitmaptimes
internalnone0.18s user 2.14s system 2% cpu 1:27.95 total
internalinternal0.16s user 2.16s system 1% cpu 2:01.12 total
raid1   none0.07s user 2.02s system 2% cpu 1:20.62 total
raid1   internal0.14s user 2.01s system 1% cpu 1:55.18 total
raid1   raid1   0.14s user 2.03s system 2% cpu 1:20.61 total
umemnone0.13s user 2.07s system 2% cpu 1:20.77 total
umeminternal0.15s user 2.16s system 2% cpu 1:51.28 total
umemumem0.12s user 2.13s system 2% cpu 1:20.50 total


raid5:
- 4x seagate 7200.10 400GB on marvell MV88SX6081
- mdadm --create --level=5 --raid-devices=4 /dev/md4 /dev/sd[abcd]1

raid1:
- 2x maxtor 6Y200P0 on 3ware 7504
- two 128MiB partitions starting at cyl 1
- mdadm --create --level=1 --raid-disks=2 --auto=yes --assume-clean /dev/md1 
/dev/sd[fg]1
- mdadm --create --level=1 --raid-disks=2 --auto=yes --assume-clean /dev/md2 
/dev/sd[fg]2
- md1 is used for external xfs journal
- md2 has an ext3 filesystem for the external md4 bitmap

xfs:
- mkfs.xfs issued before each run using the defaults (aside from -l 
logdev=/dev/md1)
- mount -o noatime,nodiratime[,logdev=/dev/md1] 

umem:
- 512MiB Micro Memory MM-5415CN
- 2 partitions similar to the raid1 setup
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-16 Thread dean gaudet
On Sat, 16 Jun 2007, Wakko Warner wrote:

> When I've had an unclean shutdown on one of my systems (10x 50gb raid5) it's
> always slowed the system down when booting up.  Quite significantly I must
> say.  I wait until I can login and change the rebuild max speed to slow it
> down while I'm using it.   But that is another thing.

i use an external write-intent bitmap on a raid1 to avoid this... you 
could use internal bitmap but that slows down i/o too much for my tastes.  
i also use an external xfs journal for the same reason.  2 disk raid1 for 
root/journal/bitmap, N disk raid5 for bulk storage.  no spindles in 
common.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-16 Thread dean gaudet
On Sat, 16 Jun 2007, David Greaves wrote:

> Neil Brown wrote:
> > On Friday June 15, [EMAIL PROTECTED] wrote:
> >  
> > >   As I understand the way
> > > raid works, when you write a block to the array, it will have to read all
> > > the other blocks in the stripe and recalculate the parity and write it
> > > out.
> > 
> > Your understanding is incomplete.
> 
> Does this help?
> [for future reference so you can paste a url and save the typing for code :) ]
> 
> http://linux-raid.osdl.org/index.php/Initial_Array_Creation

i fixed a typo and added one more note which i think is quite fair:

It is also safe to use --assume-clean if you are performing
performance measurements of different raid configurations. Just
be sure to rebuild your array without --assume-clean when you
decide on your final configuration.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-16 Thread dean gaudet
On Sat, 16 Jun 2007, David Greaves wrote:

 Neil Brown wrote:
  On Friday June 15, [EMAIL PROTECTED] wrote:
   
 As I understand the way
   raid works, when you write a block to the array, it will have to read all
   the other blocks in the stripe and recalculate the parity and write it
   out.
  
  Your understanding is incomplete.
 
 Does this help?
 [for future reference so you can paste a url and save the typing for code :) ]
 
 http://linux-raid.osdl.org/index.php/Initial_Array_Creation

i fixed a typo and added one more note which i think is quite fair:

It is also safe to use --assume-clean if you are performing
performance measurements of different raid configurations. Just
be sure to rebuild your array without --assume-clean when you
decide on your final configuration.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: limits on raid

2007-06-16 Thread dean gaudet
On Sat, 16 Jun 2007, Wakko Warner wrote:

 When I've had an unclean shutdown on one of my systems (10x 50gb raid5) it's
 always slowed the system down when booting up.  Quite significantly I must
 say.  I wait until I can login and change the rebuild max speed to slow it
 down while I'm using it.   But that is another thing.

i use an external write-intent bitmap on a raid1 to avoid this... you 
could use internal bitmap but that slows down i/o too much for my tastes.  
i also use an external xfs journal for the same reason.  2 disk raid1 for 
root/journal/bitmap, N disk raid5 for bulk storage.  no spindles in 
common.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [shm][hugetlb] Fix get_policy for stacked shared memory files

2007-06-11 Thread dean gaudet
On Mon, 11 Jun 2007, Adam Litke wrote:

> Here's another breakage as a result of shared memory stacked files :(
> 
> The NUMA policy for a VMA is determined by checking the following (in the 
> order
> given):
> 
> 1) vma->vm_ops->get_policy() (if defined)
> 2) vma->vm_policy (if defined)
> 3) task->mempolicy (if defined)
> 4) Fall back to default_policy
> 
> By switching to stacked files for shared memory, get_policy() is now always 
> set
> to shm_get_policy which is a wrapper function.  This causes us to stop at step
> 1, which yields NULL for hugetlb instead of task->mempolicy which was the
> previous (and correct) result.
> 
> This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
> wrapped vm_ops.  Andi and Christoph, does this look right to you?

thanks for the patch -- seems to do the trick for me.  it seems like it 
would be a candidate for stable series as well.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [shm][hugetlb] Fix get_policy for stacked shared memory files

2007-06-11 Thread dean gaudet
On Mon, 11 Jun 2007, Adam Litke wrote:

 Here's another breakage as a result of shared memory stacked files :(
 
 The NUMA policy for a VMA is determined by checking the following (in the 
 order
 given):
 
 1) vma-vm_ops-get_policy() (if defined)
 2) vma-vm_policy (if defined)
 3) task-mempolicy (if defined)
 4) Fall back to default_policy
 
 By switching to stacked files for shared memory, get_policy() is now always 
 set
 to shm_get_policy which is a wrapper function.  This causes us to stop at step
 1, which yields NULL for hugetlb instead of task-mempolicy which was the
 previous (and correct) result.
 
 This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for the
 wrapped vm_ops.  Andi and Christoph, does this look right to you?

thanks for the patch -- seems to do the trick for me.  it seems like it 
would be a candidate for stable series as well.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-09 Thread dean gaudet
On Sat, 9 Jun 2007, Linus Torvalds wrote:

> IOW, the most common case for libraries is not that they get invoced to do 
> one thing, but that they get loaded and then used over and over and over 
> again, and the _reason_ for wanting to have a file descriptor open may 
> well be that the library wants to cache the file descriptor, rather than 
> having to open a file over and over again!

for an example of a library wanting to cache an open fd ... and failing 
miserably at protecting itself from the application closing its fd read:

http://bugzilla.padl.com/show_bug.cgi?id=304
http://bugzilla.padl.com/show_bug.cgi?id=305

basically libnss-ldap is trying to use getsockname/getpeername to prove 
that an fd belongs to it.  the failure modes are quite delightful.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21 numa policy and huge pages not working

2007-06-09 Thread dean gaudet
On Tue, 15 May 2007, William Lee Irwin III wrote:

> On Tue, May 15, 2007 at 10:41:06PM -0700, dean gaudet wrote:
> > prior to 2.6.21 i could "numactl --interleave=all" and use SHM_HUGETLB and 
> > the interleave policy would be respected.  as of 2.6.21 it doesn't seem to 
> > respect the policy on SHM_HUGETLB request.
> > see test program below.
> > output from pre-2.6.21:
> > 2ab19620 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 
> > N2=8 N3=8
> > 2ab19a20 default file=/SYSV\040(deleted) dirty=16384 active=0 
> > N0=4096 N1=4096 N2=4096 N3=4096
> > output from 2.6.21:
> > 2b49b1c0 default file=/10\040(deleted) huge dirty=32 N3=32
> > 2b49b5c0 default file=/SYSV\040(deleted) dirty=16384 active=0 
> > N0=4096 N1=4096 N2=4096 N3=4096
> > was this an intentional behaviour change?  it seems to be only affecting 
> > SHM_HUGETLB allocations.  (i haven't tested hugetlbfs yet.)
> > run with "numactl --interleave=all ./shmtest"
> 
> This was not intentional. I'll search for where it broke.

ok i've narrowed it some... maybe.

in commit 8ef8286689c6b5bc76212437b85bdd2ba749ee44 things work fine, numa 
policy is respected...

the very next commit bc56bba8f31bd99f350a5ebfd43d50f411b620c7 breaks shm 
badly causing the test program to oops the kernel.

commit 516dffdcd8827a40532798602830dfcfc672294c fixes that breakage but 
numa policy is no longer respected.

i've added the authors of those two commits to the recipient list and 
reattached the test program.  hopefully someone can shed light on the 
problem.

-dean#include 
#include 
#include 
#include 
#include 
#include 
#include 


static void *alloc_arena_shm(size_t arena_size, unsigned flags)
{
	FILE *fh;
	char buf[512];
	size_t huge_page_size;
	char *p;
	int shmid;
	void *arena;

	// find Hugepagesize in /proc/meminfo
	if ((fh = fopen("/proc/meminfo", "r")) == NULL) {
		perror("open(/proc/meminfo)");
		exit(1);
	}
	for (;;) {
		if (fgets(buf, sizeof(buf)-1, fh) == NULL) {
			fprintf(stderr, "didn't find Hugepagesize in /proc/meminfo");
			exit(1);
		}
		buf[sizeof(buf)-1] = '\0';
		if (strncmp(buf, "Hugepagesize:", 13) == 0) break;
	}
	p = strchr(buf, ':') + 1;
	huge_page_size = strtoul(p, 0, 0) * 1024;
	fclose(fh);

	// round the size up to multiple of huge_page_size
	arena_size = (arena_size + huge_page_size - 1) & ~(huge_page_size - 1);

	shmid = shmget(IPC_PRIVATE, arena_size, IPC_CREAT|IPC_EXCL|flags|0600);
	if (shmid == -1) {
		perror("shmget");
		exit(1);
	}

	arena = shmat(shmid, NULL, 0);
	if (arena == (void *)-1) {
		perror("shmat");
		exit(1);
	}

	if (shmctl(shmid, IPC_RMID, 0) == -1) {
		perror("shmctl warning");
	}

	return arena;
}

int main(int argc, char **argv)
{
char buf[1024];
const size_t sz = 64*1024*1024;
void *arena = alloc_arena_shm(sz, SHM_HUGETLB);
memset(arena, 0, sz);
snprintf(buf, sizeof(buf), "grep ^%llx /proc/%d/numa_maps", (unsigned long long)arena, (int)getpid());
system(buf);

arena = alloc_arena_shm(sz, 0);
memset(arena, 0, sz);
snprintf(buf, sizeof(buf), "grep ^%llx /proc/%d/numa_maps", (unsigned long long)arena, (int)getpid());
system(buf);

return 0;
}


Re: [PATCH] Introduce O_CLOEXEC (take >2)

2007-06-09 Thread dean gaudet
nice.  i proposed something like this 8 or so years ago... the problem is 
that you've also got to deal with socket(2), socketpair(2), accept(2), 
pipe(2), dup(2), dup2(2), fcntl(F_DUPFD)... everything which creates new 
fds.

really what is desired is fork/clone with selective duping of fds.  i.e. 
you supply the list of what will become fd 0,1,2 in the child.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21 numa policy and huge pages not working

2007-06-09 Thread dean gaudet
On Tue, 15 May 2007, William Lee Irwin III wrote:

> On Tue, May 15, 2007 at 10:41:06PM -0700, dean gaudet wrote:
> > prior to 2.6.21 i could "numactl --interleave=all" and use SHM_HUGETLB and 
> > the interleave policy would be respected.  as of 2.6.21 it doesn't seem to 
> > respect the policy on SHM_HUGETLB request.
> > see test program below.
> > output from pre-2.6.21:
> > 2ab19620 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 
> > N2=8 N3=8
> > 2ab19a20 default file=/SYSV\040(deleted) dirty=16384 active=0 
> > N0=4096 N1=4096 N2=4096 N3=4096
> > output from 2.6.21:
> > 2b49b1c0 default file=/10\040(deleted) huge dirty=32 N3=32
> > 2b49b5c0 default file=/SYSV\040(deleted) dirty=16384 active=0 
> > N0=4096 N1=4096 N2=4096 N3=4096
> > was this an intentional behaviour change?  it seems to be only affecting 
> > SHM_HUGETLB allocations.  (i haven't tested hugetlbfs yet.)
> > run with "numactl --interleave=all ./shmtest"
> 
> This was not intentional. I'll search for where it broke.

any luck?  i just tested with 2.6.22-rc4 and it's still broken... hmm 
maybe i should learn how to git bisect.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21 numa policy and huge pages not working

2007-06-09 Thread dean gaudet
On Tue, 15 May 2007, William Lee Irwin III wrote:

 On Tue, May 15, 2007 at 10:41:06PM -0700, dean gaudet wrote:
  prior to 2.6.21 i could numactl --interleave=all and use SHM_HUGETLB and 
  the interleave policy would be respected.  as of 2.6.21 it doesn't seem to 
  respect the policy on SHM_HUGETLB request.
  see test program below.
  output from pre-2.6.21:
  2ab19620 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 
  N2=8 N3=8
  2ab19a20 default file=/SYSV\040(deleted) dirty=16384 active=0 
  N0=4096 N1=4096 N2=4096 N3=4096
  output from 2.6.21:
  2b49b1c0 default file=/10\040(deleted) huge dirty=32 N3=32
  2b49b5c0 default file=/SYSV\040(deleted) dirty=16384 active=0 
  N0=4096 N1=4096 N2=4096 N3=4096
  was this an intentional behaviour change?  it seems to be only affecting 
  SHM_HUGETLB allocations.  (i haven't tested hugetlbfs yet.)
  run with numactl --interleave=all ./shmtest
 
 This was not intentional. I'll search for where it broke.

any luck?  i just tested with 2.6.22-rc4 and it's still broken... hmm 
maybe i should learn how to git bisect.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Introduce O_CLOEXEC (take 2)

2007-06-09 Thread dean gaudet
nice.  i proposed something like this 8 or so years ago... the problem is 
that you've also got to deal with socket(2), socketpair(2), accept(2), 
pipe(2), dup(2), dup2(2), fcntl(F_DUPFD)... everything which creates new 
fds.

really what is desired is fork/clone with selective duping of fds.  i.e. 
you supply the list of what will become fd 0,1,2 in the child.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21 numa policy and huge pages not working

2007-06-09 Thread dean gaudet
On Tue, 15 May 2007, William Lee Irwin III wrote:

 On Tue, May 15, 2007 at 10:41:06PM -0700, dean gaudet wrote:
  prior to 2.6.21 i could numactl --interleave=all and use SHM_HUGETLB and 
  the interleave policy would be respected.  as of 2.6.21 it doesn't seem to 
  respect the policy on SHM_HUGETLB request.
  see test program below.
  output from pre-2.6.21:
  2ab19620 interleave=0-3 file=/2\040(deleted) huge dirty=32 N0=8 N1=8 
  N2=8 N3=8
  2ab19a20 default file=/SYSV\040(deleted) dirty=16384 active=0 
  N0=4096 N1=4096 N2=4096 N3=4096
  output from 2.6.21:
  2b49b1c0 default file=/10\040(deleted) huge dirty=32 N3=32
  2b49b5c0 default file=/SYSV\040(deleted) dirty=16384 active=0 
  N0=4096 N1=4096 N2=4096 N3=4096
  was this an intentional behaviour change?  it seems to be only affecting 
  SHM_HUGETLB allocations.  (i haven't tested hugetlbfs yet.)
  run with numactl --interleave=all ./shmtest
 
 This was not intentional. I'll search for where it broke.

ok i've narrowed it some... maybe.

in commit 8ef8286689c6b5bc76212437b85bdd2ba749ee44 things work fine, numa 
policy is respected...

the very next commit bc56bba8f31bd99f350a5ebfd43d50f411b620c7 breaks shm 
badly causing the test program to oops the kernel.

commit 516dffdcd8827a40532798602830dfcfc672294c fixes that breakage but 
numa policy is no longer respected.

i've added the authors of those two commits to the recipient list and 
reattached the test program.  hopefully someone can shed light on the 
problem.

-dean#include sys/mman.h
#include sys/ipc.h
#include sys/shm.h
#include stdio.h
#include unistd.h
#include string.h
#include stdlib.h


static void *alloc_arena_shm(size_t arena_size, unsigned flags)
{
	FILE *fh;
	char buf[512];
	size_t huge_page_size;
	char *p;
	int shmid;
	void *arena;

	// find Hugepagesize in /proc/meminfo
	if ((fh = fopen(/proc/meminfo, r)) == NULL) {
		perror(open(/proc/meminfo));
		exit(1);
	}
	for (;;) {
		if (fgets(buf, sizeof(buf)-1, fh) == NULL) {
			fprintf(stderr, didn't find Hugepagesize in /proc/meminfo);
			exit(1);
		}
		buf[sizeof(buf)-1] = '\0';
		if (strncmp(buf, Hugepagesize:, 13) == 0) break;
	}
	p = strchr(buf, ':') + 1;
	huge_page_size = strtoul(p, 0, 0) * 1024;
	fclose(fh);

	// round the size up to multiple of huge_page_size
	arena_size = (arena_size + huge_page_size - 1)  ~(huge_page_size - 1);

	shmid = shmget(IPC_PRIVATE, arena_size, IPC_CREAT|IPC_EXCL|flags|0600);
	if (shmid == -1) {
		perror(shmget);
		exit(1);
	}

	arena = shmat(shmid, NULL, 0);
	if (arena == (void *)-1) {
		perror(shmat);
		exit(1);
	}

	if (shmctl(shmid, IPC_RMID, 0) == -1) {
		perror(shmctl warning);
	}

	return arena;
}

int main(int argc, char **argv)
{
char buf[1024];
const size_t sz = 64*1024*1024;
void *arena = alloc_arena_shm(sz, SHM_HUGETLB);
memset(arena, 0, sz);
snprintf(buf, sizeof(buf), grep ^%llx /proc/%d/numa_maps, (unsigned long long)arena, (int)getpid());
system(buf);

arena = alloc_arena_shm(sz, 0);
memset(arena, 0, sz);
snprintf(buf, sizeof(buf), grep ^%llx /proc/%d/numa_maps, (unsigned long long)arena, (int)getpid());
system(buf);

return 0;
}


Re: [patch 7/8] fdmap v2 - implement sys_socket2

2007-06-09 Thread dean gaudet
On Sat, 9 Jun 2007, Linus Torvalds wrote:

 IOW, the most common case for libraries is not that they get invoced to do 
 one thing, but that they get loaded and then used over and over and over 
 again, and the _reason_ for wanting to have a file descriptor open may 
 well be that the library wants to cache the file descriptor, rather than 
 having to open a file over and over again!

for an example of a library wanting to cache an open fd ... and failing 
miserably at protecting itself from the application closing its fd read:

http://bugzilla.padl.com/show_bug.cgi?id=304
http://bugzilla.padl.com/show_bug.cgi?id=305

basically libnss-ldap is trying to use getsockname/getpeername to prove 
that an fd belongs to it.  the failure modes are quite delightful.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 43/69] i386: HPET, check if the counter works

2007-06-05 Thread dean gaudet
ugh... do not send email before breakfast.  do not send email before 
breakfast.  nevermind :)

-dean

On Tue, 5 Jun 2007, dean gaudet wrote:

> the HPET specification allows for HPETs with *much* lower resolution than 
> 50us.  in fact Fmin is 10Hz iirc.  (sorry to jump in so late, but i'm 
> about a month behind on the list.)
> 
> -dean
> 
> On Mon, 21 May 2007, Chris Wright wrote:
> 
> > -stable review patch.  If anyone has any objections, please let us know.
> > -
> > 
> > From: Thomas Gleixner <[EMAIL PROTECTED]>
> > 
> > Some systems have a HPET which is not incrementing, which leads to a
> > complete hang. Detect it during HPET setup.
> > 
> > Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
> > Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
> > ---
> > [chrisw: Why is this not upstream yet?]
> > 
> > ---
> >  arch/i386/kernel/hpet.c |   24 +++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
> > 
> > --- linux-2.6.21.1.orig/arch/i386/kernel/hpet.c
> > +++ linux-2.6.21.1/arch/i386/kernel/hpet.c
> > @@ -226,7 +226,8 @@ int __init hpet_enable(void)
> >  {
> > unsigned long id;
> > uint64_t hpet_freq;
> > -   u64 tmp;
> > +   u64 tmp, start, now;
> > +   cycle_t t1;
> >  
> > if (!is_hpet_capable())
> > return 0;
> > @@ -273,6 +274,27 @@ int __init hpet_enable(void)
> > /* Start the counter */
> > hpet_start_counter();
> >  
> > +   /* Verify whether hpet counter works */
> > +   t1 = read_hpet();
> > +   rdtscll(start);
> > +
> > +   /*
> > +* We don't know the TSC frequency yet, but waiting for
> > +* 20 TSC cycles is safe:
> > +* 4 GHz == 50us
> > +* 1 GHz == 200us
> > +*/
> > +   do {
> > +   rep_nop();
> > +   rdtscll(now);
> > +   } while ((now - start) < 20UL);
> > +
> > +   if (t1 == read_hpet()) {
> > +   printk(KERN_WARNING
> > +  "HPET counter not counting. HPET disabled\n");
> > +   goto out_nohpet;
> > +   }
> > +
> > /* Initialize and register HPET clocksource
> >  *
> >  * hpet period is in femto seconds per cycle
> > 
> > -- 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 43/69] i386: HPET, check if the counter works

2007-06-05 Thread dean gaudet
the HPET specification allows for HPETs with *much* lower resolution than 
50us.  in fact Fmin is 10Hz iirc.  (sorry to jump in so late, but i'm 
about a month behind on the list.)

-dean

On Mon, 21 May 2007, Chris Wright wrote:

> -stable review patch.  If anyone has any objections, please let us know.
> -
> 
> From: Thomas Gleixner <[EMAIL PROTECTED]>
> 
> Some systems have a HPET which is not incrementing, which leads to a
> complete hang. Detect it during HPET setup.
> 
> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
> Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
> ---
> [chrisw: Why is this not upstream yet?]
> 
> ---
>  arch/i386/kernel/hpet.c |   24 +++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> --- linux-2.6.21.1.orig/arch/i386/kernel/hpet.c
> +++ linux-2.6.21.1/arch/i386/kernel/hpet.c
> @@ -226,7 +226,8 @@ int __init hpet_enable(void)
>  {
>   unsigned long id;
>   uint64_t hpet_freq;
> - u64 tmp;
> + u64 tmp, start, now;
> + cycle_t t1;
>  
>   if (!is_hpet_capable())
>   return 0;
> @@ -273,6 +274,27 @@ int __init hpet_enable(void)
>   /* Start the counter */
>   hpet_start_counter();
>  
> + /* Verify whether hpet counter works */
> + t1 = read_hpet();
> + rdtscll(start);
> +
> + /*
> +  * We don't know the TSC frequency yet, but waiting for
> +  * 20 TSC cycles is safe:
> +  * 4 GHz == 50us
> +  * 1 GHz == 200us
> +  */
> + do {
> + rep_nop();
> + rdtscll(now);
> + } while ((now - start) < 20UL);
> +
> + if (t1 == read_hpet()) {
> + printk(KERN_WARNING
> +"HPET counter not counting. HPET disabled\n");
> + goto out_nohpet;
> + }
> +
>   /* Initialize and register HPET clocksource
>*
>* hpet period is in femto seconds per cycle
> 
> -- 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 43/69] i386: HPET, check if the counter works

2007-06-05 Thread dean gaudet
the HPET specification allows for HPETs with *much* lower resolution than 
50us.  in fact Fmin is 10Hz iirc.  (sorry to jump in so late, but i'm 
about a month behind on the list.)

-dean

On Mon, 21 May 2007, Chris Wright wrote:

 -stable review patch.  If anyone has any objections, please let us know.
 -
 
 From: Thomas Gleixner [EMAIL PROTECTED]
 
 Some systems have a HPET which is not incrementing, which leads to a
 complete hang. Detect it during HPET setup.
 
 Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]
 Signed-off-by: Chris Wright [EMAIL PROTECTED]
 ---
 [chrisw: Why is this not upstream yet?]
 
 ---
  arch/i386/kernel/hpet.c |   24 +++-
  1 file changed, 23 insertions(+), 1 deletion(-)
 
 --- linux-2.6.21.1.orig/arch/i386/kernel/hpet.c
 +++ linux-2.6.21.1/arch/i386/kernel/hpet.c
 @@ -226,7 +226,8 @@ int __init hpet_enable(void)
  {
   unsigned long id;
   uint64_t hpet_freq;
 - u64 tmp;
 + u64 tmp, start, now;
 + cycle_t t1;
  
   if (!is_hpet_capable())
   return 0;
 @@ -273,6 +274,27 @@ int __init hpet_enable(void)
   /* Start the counter */
   hpet_start_counter();
  
 + /* Verify whether hpet counter works */
 + t1 = read_hpet();
 + rdtscll(start);
 +
 + /*
 +  * We don't know the TSC frequency yet, but waiting for
 +  * 20 TSC cycles is safe:
 +  * 4 GHz == 50us
 +  * 1 GHz == 200us
 +  */
 + do {
 + rep_nop();
 + rdtscll(now);
 + } while ((now - start)  20UL);
 +
 + if (t1 == read_hpet()) {
 + printk(KERN_WARNING
 +HPET counter not counting. HPET disabled\n);
 + goto out_nohpet;
 + }
 +
   /* Initialize and register HPET clocksource
*
* hpet period is in femto seconds per cycle
 
 -- 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 43/69] i386: HPET, check if the counter works

2007-06-05 Thread dean gaudet
ugh... do not send email before breakfast.  do not send email before 
breakfast.  nevermind :)

-dean

On Tue, 5 Jun 2007, dean gaudet wrote:

 the HPET specification allows for HPETs with *much* lower resolution than 
 50us.  in fact Fmin is 10Hz iirc.  (sorry to jump in so late, but i'm 
 about a month behind on the list.)
 
 -dean
 
 On Mon, 21 May 2007, Chris Wright wrote:
 
  -stable review patch.  If anyone has any objections, please let us know.
  -
  
  From: Thomas Gleixner [EMAIL PROTECTED]
  
  Some systems have a HPET which is not incrementing, which leads to a
  complete hang. Detect it during HPET setup.
  
  Signed-off-by: Thomas Gleixner [EMAIL PROTECTED]
  Signed-off-by: Chris Wright [EMAIL PROTECTED]
  ---
  [chrisw: Why is this not upstream yet?]
  
  ---
   arch/i386/kernel/hpet.c |   24 +++-
   1 file changed, 23 insertions(+), 1 deletion(-)
  
  --- linux-2.6.21.1.orig/arch/i386/kernel/hpet.c
  +++ linux-2.6.21.1/arch/i386/kernel/hpet.c
  @@ -226,7 +226,8 @@ int __init hpet_enable(void)
   {
  unsigned long id;
  uint64_t hpet_freq;
  -   u64 tmp;
  +   u64 tmp, start, now;
  +   cycle_t t1;
   
  if (!is_hpet_capable())
  return 0;
  @@ -273,6 +274,27 @@ int __init hpet_enable(void)
  /* Start the counter */
  hpet_start_counter();
   
  +   /* Verify whether hpet counter works */
  +   t1 = read_hpet();
  +   rdtscll(start);
  +
  +   /*
  +* We don't know the TSC frequency yet, but waiting for
  +* 20 TSC cycles is safe:
  +* 4 GHz == 50us
  +* 1 GHz == 200us
  +*/
  +   do {
  +   rep_nop();
  +   rdtscll(now);
  +   } while ((now - start)  20UL);
  +
  +   if (t1 == read_hpet()) {
  +   printk(KERN_WARNING
  +  HPET counter not counting. HPET disabled\n);
  +   goto out_nohpet;
  +   }
  +
  /* Initialize and register HPET clocksource
   *
   * hpet period is in femto seconds per cycle
  
  -- 
  -
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
  
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFT, v4] sata_mv: convert to new EH

2007-05-26 Thread dean gaudet
On Fri, 25 May 2007, Jeff Garzik wrote:

> Already uncovered and fixed a few bugs in v3.
> 
> Here's v4 of the sata_mv new-EH patch.

you asked for test results with 2.6.21.3 ... that seems to boot fine,
and i've tested reading from the disks only and it seems to be working
fine.  ditto for 2.6.22-rc3.

but 2.6.22-rc3 + your v4 patch fails... i'll send you the serial console
outputs offline.

-dean




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFT, v4] sata_mv: convert to new EH

2007-05-26 Thread dean gaudet
On Fri, 25 May 2007, Jeff Garzik wrote:

 Already uncovered and fixed a few bugs in v3.
 
 Here's v4 of the sata_mv new-EH patch.

you asked for test results with 2.6.21.3 ... that seems to boot fine,
and i've tested reading from the disks only and it seems to be working
fine.  ditto for 2.6.22-rc3.

but 2.6.22-rc3 + your v4 patch fails... i'll send you the serial console
outputs offline.

-dean




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   >