Time zones

2018-12-15 Thread Raymond Jennings
Is it possible to tell the kernel what time zone the RTC is in?

Right now it appears to assume that it's always in UTC, and this
causes a few headaches during the boot process.

As it is I tried to file a bug to have openrc activate hwclock
earlier, but it was rejected.


Aborting a core dump on a fatal signal

2017-09-25 Thread Raymond Jennings
Would there be any benefit to allowing an in-progress core dump to be
aborted if the dumping process receives a fatal signal?

Example:

Process segfaults, starts dumping core, but it has a lot of virtual
memory allocated so it promptly leads to a queue-clogging deluge of I/O
that takes in some cases several minutes to finish.

In between the time where it snags the crash signal and the time when
it finishes dumping and terminates, I'd like to be able to send it a
SIGKILL or something to abort the core dump.




Aborting a core dump on a fatal signal

2017-09-25 Thread Raymond Jennings
Would there be any benefit to allowing an in-progress core dump to be
aborted if the dumping process receives a fatal signal?

Example:

Process segfaults, starts dumping core, but it has a lot of virtual
memory allocated so it promptly leads to a queue-clogging deluge of I/O
that takes in some cases several minutes to finish.

In between the time where it snags the crash signal and the time when
it finishes dumping and terminates, I'd like to be able to send it a
SIGKILL or something to abort the core dump.




Re: [PATCH 0/3] ABI CHANGE!!! Remove questionable remote SP reads

2016-10-04 Thread Raymond Jennings
My personal opinion is that even looking at esp/rsp is asking for 
trouble.  The only reliable information is VM_STACK or another VM flag 
that makes the area expand in response to stack growth.


Besides, userspace could always play funky trampoline games with the 
stack pointer, or even dynamically expand the stack by doing a malloc 
if a stack overflow draws near, which would put the stack in the data 
section temporarily.


As long as esp is in the bounds of a valid VMA, my vote is that we 
should consider it undefined how the task uses it.


On Mon, Oct 3, 2016 at 4:17 PM, Linus Torvalds 
 wrote:
On Mon, Oct 3, 2016 at 4:08 PM, Andy Lutomirski  
wrote:


 Ping!

 We need to decide fairly soon whether to apply these (or perhaps 
just

 patch 1 or just patches 2 and 3) for 4.9.  For any parts that aren't
 applied, I'll send quick fixups to pin the stack in the offending
 code.


I think we should apply it. Hopefully nothing uses it, and nobody will
notice. And if somebody *does* notice, the sooner we find out, the
better.

 Linus




Re: [PATCH 0/3] ABI CHANGE!!! Remove questionable remote SP reads

2016-10-04 Thread Raymond Jennings
My personal opinion is that even looking at esp/rsp is asking for 
trouble.  The only reliable information is VM_STACK or another VM flag 
that makes the area expand in response to stack growth.


Besides, userspace could always play funky trampoline games with the 
stack pointer, or even dynamically expand the stack by doing a malloc 
if a stack overflow draws near, which would put the stack in the data 
section temporarily.


As long as esp is in the bounds of a valid VMA, my vote is that we 
should consider it undefined how the task uses it.


On Mon, Oct 3, 2016 at 4:17 PM, Linus Torvalds 
 wrote:
On Mon, Oct 3, 2016 at 4:08 PM, Andy Lutomirski  
wrote:


 Ping!

 We need to decide fairly soon whether to apply these (or perhaps 
just

 patch 1 or just patches 2 and 3) for 4.9.  For any parts that aren't
 applied, I'll send quick fixups to pin the stack in the offending
 code.


I think we should apply it. Hopefully nothing uses it, and nobody will
notice. And if somebody *does* notice, the sooner we find out, the
better.

 Linus




Re: BUG_ON() in workingset_node_shadows_dec() triggers

2016-10-04 Thread Raymond Jennings
On Mon, Oct 3, 2016 at 9:12 PM, Linus Torvalds 
 wrote:
On Mon, Oct 3, 2016 at 9:07 PM, Andrew Morton 
 wrote:


 Well, it's a VM_BUG_ON and few people run with CONFIG_DEBUG_VM.


Ehh. If by "few people" you mean "pretty much everybody", you'd be
right, but your choice of wording would be somewhat misleading,
wouldn't you say?

Hint: here's a line from the standard Fedora kernel config:

CONFIG_DEBUG_VM=y

so *no*. VM_BUG_ON() is no less deadly than a regular BUG_ON(). It
just allows some people to build smaller kernels, but apparently
distro people would rather have debugging than save a few kB of RAM.

The VM debvugging code has VM_WARN_ON() and VM_WARN_ON_ONCE() for
people who want to get a "oops, my assumptions were wrong"

Killing machines because somebody made an assumption that was wrong 
is not ok.


Killing the machine is ok if we have a situation where there literally
is no other choice.


For the curious:

This would include situations like

1.  The kernel is confused and further processing would result in 
undefined behavior (like bluesmoke detecting PCC for example)


2.  Security hazards where we'd leak stuff if we don't shut down.

?


  Linus




Re: BUG_ON() in workingset_node_shadows_dec() triggers

2016-10-04 Thread Raymond Jennings
On Mon, Oct 3, 2016 at 9:12 PM, Linus Torvalds 
 wrote:
On Mon, Oct 3, 2016 at 9:07 PM, Andrew Morton 
 wrote:


 Well, it's a VM_BUG_ON and few people run with CONFIG_DEBUG_VM.


Ehh. If by "few people" you mean "pretty much everybody", you'd be
right, but your choice of wording would be somewhat misleading,
wouldn't you say?

Hint: here's a line from the standard Fedora kernel config:

CONFIG_DEBUG_VM=y

so *no*. VM_BUG_ON() is no less deadly than a regular BUG_ON(). It
just allows some people to build smaller kernels, but apparently
distro people would rather have debugging than save a few kB of RAM.

The VM debvugging code has VM_WARN_ON() and VM_WARN_ON_ONCE() for
people who want to get a "oops, my assumptions were wrong"

Killing machines because somebody made an assumption that was wrong 
is not ok.


Killing the machine is ok if we have a situation where there literally
is no other choice.


For the curious:

This would include situations like

1.  The kernel is confused and further processing would result in 
undefined behavior (like bluesmoke detecting PCC for example)


2.  Security hazards where we'd leak stuff if we don't shut down.

?


  Linus




Removal of wchan and top

2015-11-13 Thread Raymond Jennings
Hey, don't know if this is important enough, but could I request that 
the removal of wchan be reverted, or at least wrapped in an optional 
config setting?


I happen to enjoy monitoring this information with a secure top, and 
it's useful for understanding how my system works and I've used it a 
few times for debugging.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Removal of wchan and top

2015-11-13 Thread Raymond Jennings
Hey, don't know if this is important enough, but could I request that 
the removal of wchan be reverted, or at least wrapped in an optional 
config setting?


I happen to enjoy monitoring this information with a secure top, and 
it's useful for understanding how my system works and I've used it a 
few times for debugging.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Updated scalable urandom patchkit

2015-10-12 Thread Raymond Jennings



On Mon, Oct 12, 2015 at 7:46 PM, Theodore Ts'o  wrote:

On Mon, Oct 12, 2015 at 04:30:59PM -0400, George Spelvin wrote:
 > Segregating abusers solves both problems.  If we do this then we 
don't

 > need to drop the locks from the nonblocking pool, which solves the
 > security problem.

 Er, sort of.  I still think my points were valid, but they're
 about a particular optimization suggestion you had.  By avoiding
 the need for the optimization, the entire issue is mooted.


Sure, I'm not in love with anyone's particular optimization, whether
it's mine, yours, or Andi's.  I'm just trying to solve the scalability
problem while also trying to keep the code maintainable and easy to
understand (and over the years we've actually made things worse, to
the extent that having a single mixing for the input and output pools
is starting to be more of problem than a feature, since we're coding
in a bunch of exceptions when it's the output pool, etc.).

So if we can solve a problem by routing around it, that's fine in my
book.

 You have to copy the state *anyway* because you don't want it 
overwritten
 by the ChaCha output, so there's really no point storing the 
constants.

 (Also, ChaCha has a simpler input block structure than Salsa20; the
 constants are all adjacent.)


We're really getting into low-level implementations here, and I think
it's best to worry about these sorts of things when we have a patch to
review.

 (Note: one problem with ChaCha specifically is that is needs 16x32 
bits
 of registers, and Arm32 doesn't quite have enough.  We may want to 
provide

 an arch CPRNG hook so people can plug in other algorithms with good
 platform support, like x86 AES instructions.)


So while a ChaCha20-based CRNG should be faster than a SHA-1 based
CRNG, and I consider this a good thing, for me speed is **not** more
important than keeping the underlying code maintainable and simple.
This is one of the reasons why I looked at, and then discarded, to use
x86 accelerated AES as the basis for a CRNG.  Setting up AES so that
it can be used easily with or without hardware acceleration looks very
complicated to do in a cross-architectural way, and I don't want to
drag in all of the crypto layer for /dev/random.

 The same variables can be used (with different parameters) to 
decide if
 we want to get out of mitigation mode.  The one thing to watch out 
for

 is that "cat /dev/sdX" may have some huge pauses once
 the buffer cache fills.  We don't want to forgive after too small a
 fixed interval.


At least initially, once we go into mitigation mode for a particular
process, it's probably safer to simply not exit it.

 Finally, we have the issue of where to attach this rate-limiting 
structure
 and crypto context.  My idea was to use the struct file.  But now 
that
 we have getrandom(2), it's harder.  mm, task_struct, signal_struct, 
what?


I'm personally more inclined to keep it with the task struct, so that
different threads will use different crypto contexts, just from
simplicity point of view since we won't need to worry about locking.

Since many processes don't use /dev/urandom or getrandom(2) at all,
the first time they do, we'd allocate a structure and hang it off the
task_struct.  When the process exits, we would explicitly memzero it
and then release the memory.


 (Post-finally, do we want this feature to be configurable under
 CONFIG_EMBEDDED?  I know keeping the /dev/random code size small is
 a speficic design goal, and abuse mitigation is optional.)


Once we code it up we can see how many bytes this takes, we can have
this discussion.  I'll note that ChaCha20 is much more compact than 
SHA1:


   textdata bss dec hex filename
   4230   0   042301086 /build/ext4-64/lib/sha1.o
   1152	304	  0	   1456	
5b0	/build/ext4-64/crypto/chacha20_generic.o


... and I've thought about this as being the first step towards
potentially replacing SHA1 with something ChaCha20 based, in light of
the SHAppening attack.  Unfortunately, BLAKE2s is similar to ChaCha
only from design perspective, not an implementation perspective.
Still, I suspect the just looking at the crypto primitives, even if we
need to include two independent copies of the ChaCha20 core crypto and
the Blake2s core crypto, it still should be about half the size of the
SHA-1 crypto primitive.

And from the non-plumbing side of things, Andi's patchset increases
the size of /dev/random by a bit over 6%, or 974 bytes from a starting
base of 15719 bytes.  It ought to be possible to implement a ChaCha20
based CRNG (ignoring the crypto primitives) in less than 974 bytes of
x86_64 assembly.  :-)

- Ted

--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


This might be stupid, 

Re: Updated scalable urandom patchkit

2015-10-12 Thread Raymond Jennings



On Mon, Oct 12, 2015 at 7:46 PM, Theodore Ts'o  wrote:

On Mon, Oct 12, 2015 at 04:30:59PM -0400, George Spelvin wrote:
 > Segregating abusers solves both problems.  If we do this then we 
don't

 > need to drop the locks from the nonblocking pool, which solves the
 > security problem.

 Er, sort of.  I still think my points were valid, but they're
 about a particular optimization suggestion you had.  By avoiding
 the need for the optimization, the entire issue is mooted.


Sure, I'm not in love with anyone's particular optimization, whether
it's mine, yours, or Andi's.  I'm just trying to solve the scalability
problem while also trying to keep the code maintainable and easy to
understand (and over the years we've actually made things worse, to
the extent that having a single mixing for the input and output pools
is starting to be more of problem than a feature, since we're coding
in a bunch of exceptions when it's the output pool, etc.).

So if we can solve a problem by routing around it, that's fine in my
book.

 You have to copy the state *anyway* because you don't want it 
overwritten
 by the ChaCha output, so there's really no point storing the 
constants.

 (Also, ChaCha has a simpler input block structure than Salsa20; the
 constants are all adjacent.)


We're really getting into low-level implementations here, and I think
it's best to worry about these sorts of things when we have a patch to
review.

 (Note: one problem with ChaCha specifically is that is needs 16x32 
bits
 of registers, and Arm32 doesn't quite have enough.  We may want to 
provide

 an arch CPRNG hook so people can plug in other algorithms with good
 platform support, like x86 AES instructions.)


So while a ChaCha20-based CRNG should be faster than a SHA-1 based
CRNG, and I consider this a good thing, for me speed is **not** more
important than keeping the underlying code maintainable and simple.
This is one of the reasons why I looked at, and then discarded, to use
x86 accelerated AES as the basis for a CRNG.  Setting up AES so that
it can be used easily with or without hardware acceleration looks very
complicated to do in a cross-architectural way, and I don't want to
drag in all of the crypto layer for /dev/random.

 The same variables can be used (with different parameters) to 
decide if
 we want to get out of mitigation mode.  The one thing to watch out 
for

 is that "cat /dev/sdX" may have some huge pauses once
 the buffer cache fills.  We don't want to forgive after too small a
 fixed interval.


At least initially, once we go into mitigation mode for a particular
process, it's probably safer to simply not exit it.

 Finally, we have the issue of where to attach this rate-limiting 
structure
 and crypto context.  My idea was to use the struct file.  But now 
that
 we have getrandom(2), it's harder.  mm, task_struct, signal_struct, 
what?


I'm personally more inclined to keep it with the task struct, so that
different threads will use different crypto contexts, just from
simplicity point of view since we won't need to worry about locking.

Since many processes don't use /dev/urandom or getrandom(2) at all,
the first time they do, we'd allocate a structure and hang it off the
task_struct.  When the process exits, we would explicitly memzero it
and then release the memory.


 (Post-finally, do we want this feature to be configurable under
 CONFIG_EMBEDDED?  I know keeping the /dev/random code size small is
 a speficic design goal, and abuse mitigation is optional.)


Once we code it up we can see how many bytes this takes, we can have
this discussion.  I'll note that ChaCha20 is much more compact than 
SHA1:


   textdata bss dec hex filename
   4230   0   042301086 /build/ext4-64/lib/sha1.o
   1152	304	  0	   1456	
5b0	/build/ext4-64/crypto/chacha20_generic.o


... and I've thought about this as being the first step towards
potentially replacing SHA1 with something ChaCha20 based, in light of
the SHAppening attack.  Unfortunately, BLAKE2s is similar to ChaCha
only from design perspective, not an implementation perspective.
Still, I suspect the just looking at the crypto primitives, even if we
need to include two independent copies of the ChaCha20 core crypto and
the Blake2s core crypto, it still should be about half the size of the
SHA-1 crypto primitive.

And from the non-plumbing side of things, Andi's patchset increases
the size of /dev/random by a bit over 6%, or 974 bytes from a starting
base of 15719 bytes.  It ought to be possible to implement a ChaCha20
based CRNG (ignoring the crypto primitives) in less than 974 bytes of
x86_64 assembly.  :-)

- Ted

--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


This 

Re: can't oom-kill zap the victim's memory?

2015-09-20 Thread Raymond Jennings

On 09/20/15 11:05, Linus Torvalds wrote:

On Sun, Sep 20, 2015 at 5:56 AM, Oleg Nesterov  wrote:

In this case the workqueue thread will block.

What workqueue thread?

pagefault_out_of_memory ->
   out_of_memory ->
  oom_kill_process

as far as I can tell, this can be called by any task. Now, that
pagefault case should only happen when the page fault comes from user
space, but we also have

__alloc_pages_slowpath ->
   __alloc_pages_may_oom ->
  out_of_memory ->
 oom_kill_process

which can be called from just about any context (but atomic
allocations will never get here, so it can schedule etc).


I think in this case the oom killer should just slap a SIGKILL on the 
task and then back out, and whatever needed the memory should just wait 
patiently for the sacrificial lamb to commit seppuku.


Which, btw, we should IMO encourage ASAP in the context of the lamb by 
having anything potentially locky or semaphory pay attention to if the 
task in question has a fatal signal pending, and if so, drop everything 
and run like hell so that the task can cough up any locks or semaphores.

So what's your point? Explain again just how do you guarantee that you
can take the mmap_sem.

Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Also, I observed that a task in the middle of dumping core doesn't 
respond to signals while it's dumping, and I would guess that might be 
the case even if the task receives a SIGKILL from the OOM handler.  Just 
a potential observation.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: can't oom-kill zap the victim's memory?

2015-09-20 Thread Raymond Jennings

On 09/20/15 11:05, Linus Torvalds wrote:

On Sun, Sep 20, 2015 at 5:56 AM, Oleg Nesterov  wrote:

In this case the workqueue thread will block.

What workqueue thread?

pagefault_out_of_memory ->
   out_of_memory ->
  oom_kill_process

as far as I can tell, this can be called by any task. Now, that
pagefault case should only happen when the page fault comes from user
space, but we also have

__alloc_pages_slowpath ->
   __alloc_pages_may_oom ->
  out_of_memory ->
 oom_kill_process

which can be called from just about any context (but atomic
allocations will never get here, so it can schedule etc).


I think in this case the oom killer should just slap a SIGKILL on the 
task and then back out, and whatever needed the memory should just wait 
patiently for the sacrificial lamb to commit seppuku.


Which, btw, we should IMO encourage ASAP in the context of the lamb by 
having anything potentially locky or semaphory pay attention to if the 
task in question has a fatal signal pending, and if so, drop everything 
and run like hell so that the task can cough up any locks or semaphores.

So what's your point? Explain again just how do you guarantee that you
can take the mmap_sem.

Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Also, I observed that a task in the middle of dumping core doesn't 
respond to signals while it's dumping, and I would guess that might be 
the case even if the task receives a SIGKILL from the OOM handler.  Just 
a potential observation.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: can't oom-kill zap the victim's memory?

2015-09-19 Thread Raymond Jennings

On 09/19/15 15:24, Linus Torvalds wrote:

On Sat, Sep 19, 2015 at 8:03 AM, Oleg Nesterov  wrote:

+
+static void oom_unmap_func(struct work_struct *work)
+{
+   struct mm_struct *mm = xchg(_unmap_mm, NULL);
+
+   if (!atomic_inc_not_zero(>mm_users))
+   return;
+
+   // If this is not safe we can do use_mm() + unuse_mm()
+   down_read(>mmap_sem);

I don't think this is safe.

What makes you sure that we might not deadlock on the mmap_sem here?
For all we know, the process that is going out of memory is in the
middle of a mmap(), and already holds the mmap_sem for writing. No?


Potentially stupid question that others may be asking: Is it legal to 
return EINTR from mmap() to let a SIGKILL from the OOM handler punch the 
task out of the kernel and back to userspace?


(sorry for the dupe btw, new email client snuck in html and I got bounced)


So at the very least that needs to be a trylock, I think. And I'm not
sure zap_page_range() is ok with the mmap_sem only held for reading.
Normally our rule is that you can *populate* the page tables
concurrently, but you can't tear the down.

 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: can't oom-kill zap the victim's memory?

2015-09-19 Thread Raymond Jennings

On 09/19/15 15:24, Linus Torvalds wrote:

On Sat, Sep 19, 2015 at 8:03 AM, Oleg Nesterov  wrote:

+
+static void oom_unmap_func(struct work_struct *work)
+{
+   struct mm_struct *mm = xchg(_unmap_mm, NULL);
+
+   if (!atomic_inc_not_zero(>mm_users))
+   return;
+
+   // If this is not safe we can do use_mm() + unuse_mm()
+   down_read(>mmap_sem);

I don't think this is safe.

What makes you sure that we might not deadlock on the mmap_sem here?
For all we know, the process that is going out of memory is in the
middle of a mmap(), and already holds the mmap_sem for writing. No?


Potentially stupid question that others may be asking: Is it legal to 
return EINTR from mmap() to let a SIGKILL from the OOM handler punch the 
task out of the kernel and back to userspace?


(sorry for the dupe btw, new email client snuck in html and I got bounced)


So at the very least that needs to be a trylock, I think. And I'm not
sure zap_page_range() is ok with the mmap_sem only held for reading.
Normally our rule is that you can *populate* the page tables
concurrently, but you can't tear the down.

 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: First kernel patch (optimization)

2015-09-18 Thread Raymond Jennings

On 09/18/15 00:42, Greg KH wrote:

On Thu, Sep 17, 2015 at 11:12:51PM -0400, Theodore Ts'o wrote:

On Wed, Sep 16, 2015 at 01:26:51PM -0400, Josh Boyer wrote:

That isn't true.  It helps the submitter understand the workflow and
expectations.  What you meant to say is that it doesn't help you.

The problem is that workflow isn't the hard part.  It's the part that
can be taught most easily, sure.  But people seem to get really hung
up on it, and I fear that we have people who never progress beyond
sending trivial patches and spelling fixes and white space fixes and
micro-optimizations.

If the "you too can be a kernel developer" classes and web sites and
tutorials also taught people how to take performance measurements, and
something about the scientific measurement, that would be something.
Or if it taught people how to create tests and to run regression
testing.  Or if it taught people how to try to do fuzz testing, and
then once they find a sequence which causes crash, how to narrow down
the failure to a specific part of the kernel, and how to fix and
confirm that the kernel no longer crashes with the fix --- that would
be useful.

If they can understand kernel code; if they can understand the
scientific measurement; if they can understand how to do performance
measurements --- being able to properly format patches is something
which most kernel developers can very easily guide a new contributor
to do correctly.  Or in the worst case, it doesn't take much time for
me to fix a whitespace problem and just tell the contributor --- by
the way, I fixed up this minor issue; could you please make sure you
do this in the future?

But if a test hasn't been tested, or if the contributor things it's a
micro-optimization, but it actually takes more CPU time and/or more
stack space and/or bloats the kernel --- that's much more work for the
kernel maintainer to have to deal with when reviewing a patch.

So I have a very strong disagreement with the belief that teaching
people the workflow is the more important thing.  In my mind, that's
like first focusing on the proper how to properly fill out a golf
score card, and the ettiquette and traditions around handicaps, etc
--- before making sure the prospective player is good at putting and
driving.  Personally, I'm terrible at putting and driving, so spending
a lot of time learning how to fill out a golf score card would be a
waste of my time.

A good kernel programmer has to understand systems thinking; how to
figure out abstractions and when it's a good thing to add a new layer
of abstraction and when it's better to rework an exsting abstraction
layer.

If we have someone who knows the workflow, but which doesn't
understand systems thinking, or how to do testing, then what?  Great,
we've just created another Nick Krause.  Do you think encouraging a
Nick Krause helps anyone?

If people really are hung up on learning the workflow, I don't mind if
they want to learn that part and send some silly micro-optimization or
spelling fix or whitespace fix.  But it's really, really important
that they move beyond that.  And if they aren't capable of moving
beyond that, trying to inflate are recruitment numbers by encouraging
someone who can only do trivial fixes means that we may be get what we
can easily measure --- but it may not be what we really need as a
community.

Ted, you are full of crap.

Where do you think that "new developers" come from?  Do they show up in
our inbox, with full knowledge of kernel internals and OS theory yet
they somehow just can't grasp how to submit a patch correctly?  Yes,
they sometimes rarely do.  But for the majority of people who got into
Linux, that is not the case at all.

People need to start with something simple, and easy, to get over the
hurdles of:
- generating a patch
- sending an email
- fixing the email client as it just corrupted the patch
- fix the subject line as it was incorrect
- fix the changelog as it was missing
- fix the email client again as it corrupted the patch in a
  different way
- giving up on using a web email client as it just will not work
- figuring out who to send the patch to
- fixing the email client as the mailing list bounced the email

Those are non-trivial tasks.  And by starting with "remove this space"
you take the worry away from the specific content of the patch, and let
them worry about the "hard" part first.

+1 for this.

For example, I for one cannot tell you how many times gmail snuck html 
sections into my outgoing emails before I finally caught it red handed 
and switched to using a local native client.



Then, after all of the above is finished, and working, then they can
start submitting real patches, that do real things, in patch series, as
they can focus on the content much more, as the problems of how to make
the patch into an acceptable format is not an issue anymore.


Did anyone read linus torvald's post that 

Re: First kernel patch (optimization)

2015-09-18 Thread Raymond Jennings

On 09/18/15 00:42, Greg KH wrote:

On Thu, Sep 17, 2015 at 11:12:51PM -0400, Theodore Ts'o wrote:

On Wed, Sep 16, 2015 at 01:26:51PM -0400, Josh Boyer wrote:

That isn't true.  It helps the submitter understand the workflow and
expectations.  What you meant to say is that it doesn't help you.

The problem is that workflow isn't the hard part.  It's the part that
can be taught most easily, sure.  But people seem to get really hung
up on it, and I fear that we have people who never progress beyond
sending trivial patches and spelling fixes and white space fixes and
micro-optimizations.

If the "you too can be a kernel developer" classes and web sites and
tutorials also taught people how to take performance measurements, and
something about the scientific measurement, that would be something.
Or if it taught people how to create tests and to run regression
testing.  Or if it taught people how to try to do fuzz testing, and
then once they find a sequence which causes crash, how to narrow down
the failure to a specific part of the kernel, and how to fix and
confirm that the kernel no longer crashes with the fix --- that would
be useful.

If they can understand kernel code; if they can understand the
scientific measurement; if they can understand how to do performance
measurements --- being able to properly format patches is something
which most kernel developers can very easily guide a new contributor
to do correctly.  Or in the worst case, it doesn't take much time for
me to fix a whitespace problem and just tell the contributor --- by
the way, I fixed up this minor issue; could you please make sure you
do this in the future?

But if a test hasn't been tested, or if the contributor things it's a
micro-optimization, but it actually takes more CPU time and/or more
stack space and/or bloats the kernel --- that's much more work for the
kernel maintainer to have to deal with when reviewing a patch.

So I have a very strong disagreement with the belief that teaching
people the workflow is the more important thing.  In my mind, that's
like first focusing on the proper how to properly fill out a golf
score card, and the ettiquette and traditions around handicaps, etc
--- before making sure the prospective player is good at putting and
driving.  Personally, I'm terrible at putting and driving, so spending
a lot of time learning how to fill out a golf score card would be a
waste of my time.

A good kernel programmer has to understand systems thinking; how to
figure out abstractions and when it's a good thing to add a new layer
of abstraction and when it's better to rework an exsting abstraction
layer.

If we have someone who knows the workflow, but which doesn't
understand systems thinking, or how to do testing, then what?  Great,
we've just created another Nick Krause.  Do you think encouraging a
Nick Krause helps anyone?

If people really are hung up on learning the workflow, I don't mind if
they want to learn that part and send some silly micro-optimization or
spelling fix or whitespace fix.  But it's really, really important
that they move beyond that.  And if they aren't capable of moving
beyond that, trying to inflate are recruitment numbers by encouraging
someone who can only do trivial fixes means that we may be get what we
can easily measure --- but it may not be what we really need as a
community.

Ted, you are full of crap.

Where do you think that "new developers" come from?  Do they show up in
our inbox, with full knowledge of kernel internals and OS theory yet
they somehow just can't grasp how to submit a patch correctly?  Yes,
they sometimes rarely do.  But for the majority of people who got into
Linux, that is not the case at all.

People need to start with something simple, and easy, to get over the
hurdles of:
- generating a patch
- sending an email
- fixing the email client as it just corrupted the patch
- fix the subject line as it was incorrect
- fix the changelog as it was missing
- fix the email client again as it corrupted the patch in a
  different way
- giving up on using a web email client as it just will not work
- figuring out who to send the patch to
- fixing the email client as the mailing list bounced the email

Those are non-trivial tasks.  And by starting with "remove this space"
you take the worry away from the specific content of the patch, and let
them worry about the "hard" part first.

+1 for this.

For example, I for one cannot tell you how many times gmail snuck html 
sections into my outgoing emails before I finally caught it red handed 
and switched to using a local native client.



Then, after all of the above is finished, and working, then they can
start submitting real patches, that do real things, in patch series, as
they can focus on the content much more, as the problems of how to make
the patch into an acceptable format is not an issue anymore.


Did anyone read linus torvald's post that 

Re: First kernel patch (optimization)

2015-09-16 Thread Raymond Jennings



On 09/16/15 09:40, Theodore Ts'o wrote:

On Wed, Sep 16, 2015 at 05:03:39PM +0100, Eric Curtin wrote:

Hi Greg,

As I said in the subject of the mail (which I have been since told I
shouldn't have done this), I'm a noob to kernel code. I tried to pick
off something super simple to just see what the process of getting a
patch in is. Youtube videos and documentation only get you so far.

 From reading your response, should I refrain from sending in these
micro-optimizations in future? Getting in smaller patches is easier
for me as I only do this in my spare time, which I don't have a lot
of!

What I'd ask you to consider is what your end goal?  Is it just to
collect a scalp (woo hoo!  I've gotten a patch into the kernel)?  Or
is it to actually make things better for yourself or other users?  Or
are you trying to get make your self more employable, etc.
It could well be that he's wanting to practice getting used to the 
development process.


https://lkml.org/lkml/2004/12/20/255

Micro-optimizations is often not particularly useful for anything
other than the first goal, and it really doesn't help anyone.

If you're just doing this in your spare time, then hopefully I hope
you are being choosy about what's the best way to use your spare time,
so the question of what your goals are going to be is a very important
thing for you to figure out.  Regardless of whether it's worthwhile to
get this patch into the kernel, doing any *more* micro-optimizations
is probably not a good use of your time or anyone else's.

I'd strongly encourage you to move on to something more than just
micro-optimizations as quickly as possible.
Tytso is right here.  If you want to be useful you should find something 
with real impact once you've learned the ropes.





Best regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: First kernel patch (optimization)

2015-09-16 Thread Raymond Jennings



On 09/16/15 09:40, Theodore Ts'o wrote:

On Wed, Sep 16, 2015 at 05:03:39PM +0100, Eric Curtin wrote:

Hi Greg,

As I said in the subject of the mail (which I have been since told I
shouldn't have done this), I'm a noob to kernel code. I tried to pick
off something super simple to just see what the process of getting a
patch in is. Youtube videos and documentation only get you so far.

 From reading your response, should I refrain from sending in these
micro-optimizations in future? Getting in smaller patches is easier
for me as I only do this in my spare time, which I don't have a lot
of!

What I'd ask you to consider is what your end goal?  Is it just to
collect a scalp (woo hoo!  I've gotten a patch into the kernel)?  Or
is it to actually make things better for yourself or other users?  Or
are you trying to get make your self more employable, etc.
It could well be that he's wanting to practice getting used to the 
development process.


https://lkml.org/lkml/2004/12/20/255

Micro-optimizations is often not particularly useful for anything
other than the first goal, and it really doesn't help anyone.

If you're just doing this in your spare time, then hopefully I hope
you are being choosy about what's the best way to use your spare time,
so the question of what your goals are going to be is a very important
thing for you to figure out.  Regardless of whether it's worthwhile to
get this patch into the kernel, doing any *more* micro-optimizations
is probably not a good use of your time or anyone else's.

I'd strongly encourage you to move on to something more than just
micro-optimizations as quickly as possible.
Tytso is right here.  If you want to be useful you should find something 
with real impact once you've learned the ropes.





Best regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: stop breaking dosemu (Re: x86/kconfig/32: Rename CONFIG_VM86 and default it to 'n')

2015-09-04 Thread Raymond Jennings

On 09/04/15 14:30, Stas Sergeev wrote:

05.09.2015 00:16, Stas Sergeev пишет:

I agree. vm86() is a mess.
My point is that its risky parts and useless funtionality
is _already_ known (even I can point to the particular code
parts than can simply be removed). As such, it simply had
to be re-visited and cleaned up to match at least 1 and 3
(and then maybe 5). This wasn't done, and the knob was
introduced _instead_ of doing this.

Grr, I mean it was disabled by default instead of doing this,
and the knob was only proposed, not added.


You can't just pull vm86 out of the kernel anyway.  dosemu is a 
userspace application that depends on it, so pulling this feature out 
would be a big fat regression, period.


I would personally rather not hear about how "it's a legacy program so 
its userbase is shrinking" used as any sort of excuse to ignore the fact 
that we shouldn't break userspace.


I can even say as a user that vm86 is important to me.

By all means, cleaning up vm86 is a good idea.  But removing it or 
fencing it off with a strong deprecation doesn't sound like the right idea.

--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: stop breaking dosemu (Re: x86/kconfig/32: Rename CONFIG_VM86 and default it to 'n')

2015-09-04 Thread Raymond Jennings

On 09/04/15 14:30, Stas Sergeev wrote:

05.09.2015 00:16, Stas Sergeev пишет:

I agree. vm86() is a mess.
My point is that its risky parts and useless funtionality
is _already_ known (even I can point to the particular code
parts than can simply be removed). As such, it simply had
to be re-visited and cleaned up to match at least 1 and 3
(and then maybe 5). This wasn't done, and the knob was
introduced _instead_ of doing this.

Grr, I mean it was disabled by default instead of doing this,
and the knob was only proposed, not added.


You can't just pull vm86 out of the kernel anyway.  dosemu is a 
userspace application that depends on it, so pulling this feature out 
would be a big fat regression, period.


I would personally rather not hear about how "it's a legacy program so 
its userbase is shrinking" used as any sort of excuse to ignore the fact 
that we shouldn't break userspace.


I can even say as a user that vm86 is important to me.

By all means, cleaning up vm86 is a good idea.  But removing it or 
fencing it off with a strong deprecation doesn't sound like the right idea.

--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Ext3 removal, quota & udf fixes

2015-09-01 Thread Raymond Jennings

On 09/01/15 20:30, Albino B Neto wrote:

2015-08-31 23:53 GMT-03:00 Theodore Ts'o :

Yes, you can go back to ext3-only.  In fact, we do *not* automatically
upgrade the file system to use ext4-specific features.

So it's not just a "you can use ext4 instead" issue. Can you do that
*without* then forcing an upgrade forever on that partition? I'm not
sure the ext4 people are really even willing to guarantee that kind of
backwards compatibility.

Actually, we do guarantee this.  It's considered poor form to
automatically change the superblock to add new file system features in
a way that would break the ability for the user to roll back to an
older kernel.  This isn't just for ext3->ext4, but for new ext4
features such as metadata checksumming.  The user has to explicitly
enable the feature using "tune2fs -O new_feature /dev/sdXX".

Yeah!

2015-09-01 16:39 GMT-03:00 Austin S Hemmelgarn :

NO, it is not logical.  A vast majority of Android smartphones in the wild
use ext2, as do a very significant portion of embedded systems that don't
have room for the few hundred kilobytes of extra code that the ext4 driver
has in comparison to ext2.

Ext2 portion embedded and Ext3 many machines.


So basically the game plan is gutting ext3 because code-dupe with ext4, 
but keep ext2 because ext4 is too big for embedded to outright replace ext2?


Hmm...are there any embedded systems out there that use ext3 and can fit 
its code ext3 but not ext4?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Ext3 removal, quota & udf fixes

2015-09-01 Thread Raymond Jennings

On 09/01/15 20:30, Albino B Neto wrote:

2015-08-31 23:53 GMT-03:00 Theodore Ts'o :

Yes, you can go back to ext3-only.  In fact, we do *not* automatically
upgrade the file system to use ext4-specific features.

So it's not just a "you can use ext4 instead" issue. Can you do that
*without* then forcing an upgrade forever on that partition? I'm not
sure the ext4 people are really even willing to guarantee that kind of
backwards compatibility.

Actually, we do guarantee this.  It's considered poor form to
automatically change the superblock to add new file system features in
a way that would break the ability for the user to roll back to an
older kernel.  This isn't just for ext3->ext4, but for new ext4
features such as metadata checksumming.  The user has to explicitly
enable the feature using "tune2fs -O new_feature /dev/sdXX".

Yeah!

2015-09-01 16:39 GMT-03:00 Austin S Hemmelgarn :

NO, it is not logical.  A vast majority of Android smartphones in the wild
use ext2, as do a very significant portion of embedded systems that don't
have room for the few hundred kilobytes of extra code that the ext4 driver
has in comparison to ext2.

Ext2 portion embedded and Ext3 many machines.


So basically the game plan is gutting ext3 because code-dupe with ext4, 
but keep ext2 because ext4 is too big for embedded to outright replace ext2?


Hmm...are there any embedded systems out there that use ext3 and can fit 
its code ext3 but not ext4?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Ext3 removal, quota & udf fixes

2015-08-31 Thread Raymond Jennings

On 08/31/15 15:31, Raymond Jennings wrote:

On 08/31/15 14:37, Linus Torvalds wrote:

On Sun, Aug 30, 2015 at 11:19 PM, Jan Kara  wrote:

The biggest change in the pull is the removal of ext3 filesystem driver
(~28k lines removed).

I really am not ready to just remove ext3 without a lot of good
arguments. There might well be people who this use ext3 as ext3, and
don't want to update. I want more a rationale for removal than "ext4
can read old ext3 filesystems".
I actually would agree that having two drivers for the same filesystem 
is redundant and unneeded code duplication.


That said, I wouldn't mind myself if the ext4 driver were given a very 
grueling regression test to make sure it can actually handle old ext3 
systems as well as the ext3 driver can.  Just gutting an entire driver 
because another driver can handle it only makes sense if nothing can 
go wrong and the potential for causing regressions is quite obvious.


I think also that we should remove the ext2 driver before we remove 
the ext3 driver.


My two cents.

Just to ask a general opinion:

Am I right that it's ok for kernel code to be organized how we (the 
developers) see fit as long as we don't break userspace or hardware in 
the process?


So long as we function properly, should userspace care about how our 
source code is structured?


I'm thinking yes, but it might be fruitful to see an answer archived on 
the list.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Ext3 removal, quota & udf fixes

2015-08-31 Thread Raymond Jennings

On 08/31/15 15:39, Linus Torvalds wrote:

On Mon, Aug 31, 2015 at 3:31 PM, Raymond Jennings  wrote:

That said, I wouldn't mind myself if the ext4 driver were given a very
grueling regression test to make sure it can actually handle old ext3
systems as well as the ext3 driver can.

That's not my only worry. Things like "can you go back to ext3-only"
is an issue too - I don't think that's been a big priority for ext4
any more, and if there are any existing hold-outs that still use ext3,
they may want to be able to go back to old kernels.
Then we should just consider anything making an ext3 system unusuable by 
older kernels as a regression to be stomped like any other.

So it's not just a "you can use ext4 instead" issue. Can you do that
*without* then forcing an upgrade forever on that partition? I'm not
sure the ext4 people are really even willing to guarantee that kind of
backwards compatibility.

Breaking that guarantee would be an example of such a regression.

I could be ok with removing ext3 in theory, but I haven't seen a lot
of rationale for it, and I don't know if there are still users who may
have their own good reasons to stay with ext3. Maybe there has been
lots of discussion about this on fsdevel (which I don't follow), and
I'm just lacking the background, but if so I want to see that
background. Not just a oneliner description that basically says
"remove ext3 support".
I actually agree that removing support for ext3 as a filesystem is a bad 
idea.  That would be a regression.


What I'm in favor of is removing the ext3 code as redundant if ext4 code 
can handle everything.  Of course, for it to be truly redundant, the 
ext4 code has to actually be capable of managing an ext3 filesystem 
without bumping it out of compatibility with older ext3 kernels.  Any 
such bump would rightly be classified as a regression.

 Linus


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Ext3 removal, quota & udf fixes

2015-08-31 Thread Raymond Jennings

On 08/31/15 14:37, Linus Torvalds wrote:

On Sun, Aug 30, 2015 at 11:19 PM, Jan Kara  wrote:

The biggest change in the pull is the removal of ext3 filesystem driver
(~28k lines removed).

I really am not ready to just remove ext3 without a lot of good
arguments. There might well be people who this use ext3 as ext3, and
don't want to update. I want more a rationale for removal than "ext4
can read old ext3 filesystems".
I actually would agree that having two drivers for the same filesystem 
is redundant and unneeded code duplication.


That said, I wouldn't mind myself if the ext4 driver were given a very 
grueling regression test to make sure it can actually handle old ext3 
systems as well as the ext3 driver can.  Just gutting an entire driver 
because another driver can handle it only makes sense if nothing can go 
wrong and the potential for causing regressions is quite obvious.


I think also that we should remove the ext2 driver before we remove the 
ext3 driver.


My two cents.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Ext3 removal, quota & udf fixes

2015-08-31 Thread Raymond Jennings

On 08/31/15 14:37, Linus Torvalds wrote:

On Sun, Aug 30, 2015 at 11:19 PM, Jan Kara  wrote:

The biggest change in the pull is the removal of ext3 filesystem driver
(~28k lines removed).

I really am not ready to just remove ext3 without a lot of good
arguments. There might well be people who this use ext3 as ext3, and
don't want to update. I want more a rationale for removal than "ext4
can read old ext3 filesystems".
I actually would agree that having two drivers for the same filesystem 
is redundant and unneeded code duplication.


That said, I wouldn't mind myself if the ext4 driver were given a very 
grueling regression test to make sure it can actually handle old ext3 
systems as well as the ext3 driver can.  Just gutting an entire driver 
because another driver can handle it only makes sense if nothing can go 
wrong and the potential for causing regressions is quite obvious.


I think also that we should remove the ext2 driver before we remove the 
ext3 driver.


My two cents.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Ext3 removal, quota & udf fixes

2015-08-31 Thread Raymond Jennings

On 08/31/15 15:39, Linus Torvalds wrote:

On Mon, Aug 31, 2015 at 3:31 PM, Raymond Jennings <shent...@gmail.com> wrote:

That said, I wouldn't mind myself if the ext4 driver were given a very
grueling regression test to make sure it can actually handle old ext3
systems as well as the ext3 driver can.

That's not my only worry. Things like "can you go back to ext3-only"
is an issue too - I don't think that's been a big priority for ext4
any more, and if there are any existing hold-outs that still use ext3,
they may want to be able to go back to old kernels.
Then we should just consider anything making an ext3 system unusuable by 
older kernels as a regression to be stomped like any other.

So it's not just a "you can use ext4 instead" issue. Can you do that
*without* then forcing an upgrade forever on that partition? I'm not
sure the ext4 people are really even willing to guarantee that kind of
backwards compatibility.

Breaking that guarantee would be an example of such a regression.

I could be ok with removing ext3 in theory, but I haven't seen a lot
of rationale for it, and I don't know if there are still users who may
have their own good reasons to stay with ext3. Maybe there has been
lots of discussion about this on fsdevel (which I don't follow), and
I'm just lacking the background, but if so I want to see that
background. Not just a oneliner description that basically says
"remove ext3 support".
I actually agree that removing support for ext3 as a filesystem is a bad 
idea.  That would be a regression.


What I'm in favor of is removing the ext3 code as redundant if ext4 code 
can handle everything.  Of course, for it to be truly redundant, the 
ext4 code has to actually be capable of managing an ext3 filesystem 
without bumping it out of compatibility with older ext3 kernels.  Any 
such bump would rightly be classified as a regression.

 Linus


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Ext3 removal, quota & udf fixes

2015-08-31 Thread Raymond Jennings

On 08/31/15 15:31, Raymond Jennings wrote:

On 08/31/15 14:37, Linus Torvalds wrote:

On Sun, Aug 30, 2015 at 11:19 PM, Jan Kara <j...@suse.cz> wrote:

The biggest change in the pull is the removal of ext3 filesystem driver
(~28k lines removed).

I really am not ready to just remove ext3 without a lot of good
arguments. There might well be people who this use ext3 as ext3, and
don't want to update. I want more a rationale for removal than "ext4
can read old ext3 filesystems".
I actually would agree that having two drivers for the same filesystem 
is redundant and unneeded code duplication.


That said, I wouldn't mind myself if the ext4 driver were given a very 
grueling regression test to make sure it can actually handle old ext3 
systems as well as the ext3 driver can.  Just gutting an entire driver 
because another driver can handle it only makes sense if nothing can 
go wrong and the potential for causing regressions is quite obvious.


I think also that we should remove the ext2 driver before we remove 
the ext3 driver.


My two cents.

Just to ask a general opinion:

Am I right that it's ok for kernel code to be organized how we (the 
developers) see fit as long as we don't break userspace or hardware in 
the process?


So long as we function properly, should userspace care about how our 
source code is structured?


I'm thinking yes, but it might be fruitful to see an answer archived on 
the list.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Moving more kernel data into highmem?

2015-08-22 Thread Raymond Jennings
Hey, I remembered that there's an option to put third level page tables 
in highmem.


This might be a stupid question, but is there a way to move more kernel 
data into highmem?


For example, page directories, first level page tables?

I even remember a few articles on lwn about how much space is taken up 
by struct page.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Moving more kernel data into highmem?

2015-08-22 Thread Raymond Jennings
Hey, I remembered that there's an option to put third level page tables 
in highmem.


This might be a stupid question, but is there a way to move more kernel 
data into highmem?


For example, page directories, first level page tables?

I even remember a few articles on lwn about how much space is taken up 
by struct page.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Clean up whitespace in vfs.txt

2015-08-18 Thread Raymond Jennings

On 08/10/15 02:31, Raymond Jennings wrote:

I noticed that vfs.txt looked kinda funky, so I went ahead and
reformatted it.

Signed-off-by: Raymond Jennings
Cc: Andrew Morton 

---

diff --git a/Documentation/filesystems/vfs.txt
b/Documentation/filesystems/vfs.txt
index 5eb8456..8ddfe06 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -114,12 +114,12 @@ members are defined:
  struct file_system_type {
const char *name;
int fs_flags;
-struct dentry *(*mount) (struct file_system_type *, int,
-   const char *, void *);
-void (*kill_sb) (struct super_block *);
-struct module *owner;
-struct file_system_type * next;
-struct list_head fs_supers;
+   struct dentry *(*mount) (struct file_system_type *, int,
+   const char *, void *);
+   void (*kill_sb) (struct super_block *);
+   struct module *owner;
+   struct file_system_type * next;
+   struct list_head fs_supers;
struct lock_class_key s_lock_key;
struct lock_class_key s_umount_key;
  };
@@ -136,7 +136,7 @@ struct file_system_type {
should be shut down
  
owner: for internal VFS use: you should initialize this to

THIS_MODULE in
-   most cases.
+   most cases.
  
next: for internal VFS use: you should initialize this to NULL
  
@@ -145,7 +145,7 @@ struct file_system_type {

  The mount() method has the following arguments:
  
struct file_system_type *fs_type: describes the filesystem, partly

initialized
-   by the specific filesystem code
+   by the specific filesystem code
  
int flags: mount flags
  
@@ -182,12 +182,12 @@ and provides a fill_super() callback instead. The

generic variants are:
mount_nodev: mount a filesystem that is not backed by a device
  
mount_single: mount a filesystem which shares the instance between

-   all mounts
+   all mounts
  
  A fill_super() callback implementation has the following arguments:
  
struct super_block *sb: the superblock structure. The callback

-   must initialize this properly.
+   must initialize this properly.
  
void *data: arbitrary mount options, usually comes as an ASCII

string (see "Mount Options" section)
@@ -208,26 +208,26 @@ This describes how the VFS can manipulate the
superblock of your
  filesystem. As of kernel 2.6.22, the following members are defined:
  
  struct super_operations {

-struct inode *(*alloc_inode)(struct super_block *sb);
-void (*destroy_inode)(struct inode *);
-
-void (*dirty_inode) (struct inode *, int flags);
-int (*write_inode) (struct inode *, int);
-void (*drop_inode) (struct inode *);
-void (*delete_inode) (struct inode *);
-void (*put_super) (struct super_block *);
-int (*sync_fs)(struct super_block *sb, int wait);
-int (*freeze_fs) (struct super_block *);
-int (*unfreeze_fs) (struct super_block *);
-int (*statfs) (struct dentry *, struct kstatfs *);
-int (*remount_fs) (struct super_block *, int *, char *);
-void (*clear_inode) (struct inode *);
-void (*umount_begin) (struct super_block *);
-
-int (*show_options)(struct seq_file *, struct dentry *);
-
-ssize_t (*quota_read)(struct super_block *, int, char *,
size_t, loff_t);
-ssize_t (*quota_write)(struct super_block *, int, const char *,
size_t, loff_t);
+   struct inode *(*alloc_inode)(struct super_block *sb);
+   void (*destroy_inode)(struct inode *);
+
+   void (*dirty_inode) (struct inode *, int flags);
+   int (*write_inode) (struct inode *, int);
+   void (*drop_inode) (struct inode *);
+   void (*delete_inode) (struct inode *);
+   void (*put_super) (struct super_block *);
+   int (*sync_fs)(struct super_block *sb, int wait);
+   int (*freeze_fs) (struct super_block *);
+   int (*unfreeze_fs) (struct super_block *);
+   int (*statfs) (struct dentry *, struct kstatfs *);
+   int (*remount_fs) (struct super_block *, int *, char *);
+   void (*clear_inode) (struct inode *);
+   void (*umount_begin) (struct super_block *);
+
+   int (*show_options)(struct seq_file *, struct dentry *);
+
+   ssize_t (*quota_read)(struct super_block *, int, char *, size_t,
loff_t);
+   ssize_t (*quota_write)(struct super_block *, int, const char *,
size_t, loff_t);
int (*nr_cached_objects)(struct super_block *);
void (*free_cached_objects)(struct super_block *, int);
  };
@@ -238,14 +238,14 @@ only called from a process context (i.e. not from
an interrupt handler
  or bottom half).
  
alloc_inode: this method is called by alloc_inode() to allocate

memory
-   for struct inode and initialize it.  If this function is not
-   defined, a simple 'struct inode' is allocated.  Normally
-   alloc_inode will be used to allocate a larger structure which
- 

Re: [PATCH] Clean up whitespace in vfs.txt

2015-08-18 Thread Raymond Jennings

On 08/10/15 02:31, Raymond Jennings wrote:

I noticed that vfs.txt looked kinda funky, so I went ahead and
reformatted it.

Signed-off-by: Raymond Jennings
Cc: Andrew Morton a...@linux-foundation.org

---

diff --git a/Documentation/filesystems/vfs.txt
b/Documentation/filesystems/vfs.txt
index 5eb8456..8ddfe06 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -114,12 +114,12 @@ members are defined:
  struct file_system_type {
const char *name;
int fs_flags;
-struct dentry *(*mount) (struct file_system_type *, int,
-   const char *, void *);
-void (*kill_sb) (struct super_block *);
-struct module *owner;
-struct file_system_type * next;
-struct list_head fs_supers;
+   struct dentry *(*mount) (struct file_system_type *, int,
+   const char *, void *);
+   void (*kill_sb) (struct super_block *);
+   struct module *owner;
+   struct file_system_type * next;
+   struct list_head fs_supers;
struct lock_class_key s_lock_key;
struct lock_class_key s_umount_key;
  };
@@ -136,7 +136,7 @@ struct file_system_type {
should be shut down
  
owner: for internal VFS use: you should initialize this to

THIS_MODULE in
-   most cases.
+   most cases.
  
next: for internal VFS use: you should initialize this to NULL
  
@@ -145,7 +145,7 @@ struct file_system_type {

  The mount() method has the following arguments:
  
struct file_system_type *fs_type: describes the filesystem, partly

initialized
-   by the specific filesystem code
+   by the specific filesystem code
  
int flags: mount flags
  
@@ -182,12 +182,12 @@ and provides a fill_super() callback instead. The

generic variants are:
mount_nodev: mount a filesystem that is not backed by a device
  
mount_single: mount a filesystem which shares the instance between

-   all mounts
+   all mounts
  
  A fill_super() callback implementation has the following arguments:
  
struct super_block *sb: the superblock structure. The callback

-   must initialize this properly.
+   must initialize this properly.
  
void *data: arbitrary mount options, usually comes as an ASCII

string (see Mount Options section)
@@ -208,26 +208,26 @@ This describes how the VFS can manipulate the
superblock of your
  filesystem. As of kernel 2.6.22, the following members are defined:
  
  struct super_operations {

-struct inode *(*alloc_inode)(struct super_block *sb);
-void (*destroy_inode)(struct inode *);
-
-void (*dirty_inode) (struct inode *, int flags);
-int (*write_inode) (struct inode *, int);
-void (*drop_inode) (struct inode *);
-void (*delete_inode) (struct inode *);
-void (*put_super) (struct super_block *);
-int (*sync_fs)(struct super_block *sb, int wait);
-int (*freeze_fs) (struct super_block *);
-int (*unfreeze_fs) (struct super_block *);
-int (*statfs) (struct dentry *, struct kstatfs *);
-int (*remount_fs) (struct super_block *, int *, char *);
-void (*clear_inode) (struct inode *);
-void (*umount_begin) (struct super_block *);
-
-int (*show_options)(struct seq_file *, struct dentry *);
-
-ssize_t (*quota_read)(struct super_block *, int, char *,
size_t, loff_t);
-ssize_t (*quota_write)(struct super_block *, int, const char *,
size_t, loff_t);
+   struct inode *(*alloc_inode)(struct super_block *sb);
+   void (*destroy_inode)(struct inode *);
+
+   void (*dirty_inode) (struct inode *, int flags);
+   int (*write_inode) (struct inode *, int);
+   void (*drop_inode) (struct inode *);
+   void (*delete_inode) (struct inode *);
+   void (*put_super) (struct super_block *);
+   int (*sync_fs)(struct super_block *sb, int wait);
+   int (*freeze_fs) (struct super_block *);
+   int (*unfreeze_fs) (struct super_block *);
+   int (*statfs) (struct dentry *, struct kstatfs *);
+   int (*remount_fs) (struct super_block *, int *, char *);
+   void (*clear_inode) (struct inode *);
+   void (*umount_begin) (struct super_block *);
+
+   int (*show_options)(struct seq_file *, struct dentry *);
+
+   ssize_t (*quota_read)(struct super_block *, int, char *, size_t,
loff_t);
+   ssize_t (*quota_write)(struct super_block *, int, const char *,
size_t, loff_t);
int (*nr_cached_objects)(struct super_block *);
void (*free_cached_objects)(struct super_block *, int);
  };
@@ -238,14 +238,14 @@ only called from a process context (i.e. not from
an interrupt handler
  or bottom half).
  
alloc_inode: this method is called by alloc_inode() to allocate

memory
-   for struct inode and initialize it.  If this function is not
-   defined, a simple 'struct inode' is allocated.  Normally
-   alloc_inode will be used to allocate a larger structure

Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu

2015-08-13 Thread Raymond Jennings

On 08/13/15 16:18, Linus Torvalds wrote:

On Thu, Aug 13, 2015 at 4:05 PM, Linus Torvalds
 wrote:

The _only_ thing that matters is that something broke.

To clarify: things like test programs etc don't matter. Real
applications, used by real users. That's what regressions cover. If
you have a workflow that isn't just some random kernel test thing, and
you depend on it, and we break it, the kernel is supposed to fix it.

There are some (very few) exceptions.

If it's a security issue, we may not be able to "fix" it, because
other concerns can obviously take precedence.

Also, sometimes the reports come in way too late - if you were running
some stable distro kernel for several years, and updated, and notice a
change that happened four years ago and modern applications now rely
on the _new_ behavior, we may not be able to fix the regression any
more.

But no, "it was an unintentional kernel bug and clearly just stupid
crap code, and we fixed it and now the kernel is much better and
cleaner" is not a valid reason for regressions. We'll go back to the
stupid and crap code if necessary, however much that may annoy us.

For an example of the kind of things we may have to do, see commits

 64f371bc3107 autofs: make the autofsv5 packet file descriptor use
a packetized pipe
 9883035ae7ed pipes: add a "packetized pipe" mode for writing

and just wonder at the insanity. That's the kinds of things that
happen when one application had actively worked around a bug in
compatibility handling, and then trying to "fix" that bug just caused
another application to break instead.

Linus
Is there a way to temporally confine the bad crap code just to the 
applications that depend on it, or does a userspace app latching onto 
bad behavior effectively lock down the abi for the future?


I know that some features in the kernel get deprecated over a process 
(devfs for example) once userspace is given an alternative...would there 
be a process like that to deal with userspace code that is pinning a 
piece of crap in the kernel?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu

2015-08-13 Thread Raymond Jennings



On 08/13/15 14:46, Linus Torvalds wrote:

On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings  wrote:

I am curious about what's supposed to happen normally on signal delivery.

Is SS a register that's supposed to be preserved like EIP/RIP and CS when a
signal is delivered?

What exactly does "supposed" mean?
Basically, when a process/thread receives a signal, what happens to its 
registers?

So clearly, we're not "supposed" to save/restore it. Because reality
matters a hell of a lot more than any theoretical arguments.
So it still counts as a regression if the kernel pulls the rug out from 
under someone that was relying on undocumented or buggy behavior?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu

2015-08-13 Thread Raymond Jennings

On 08/13/15 13:09, Linus Torvalds wrote:

On Thu, Aug 13, 2015 at 1:08 PM, Cyrill Gorcunov  wrote:

If only I'm not missin something obvious this should not hurt us.
But I gonna build test kernel and check to be sure tomorrow, ok?

Thanks,

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

I am curious about what's supposed to happen normally on signal delivery.

Is SS a register that's supposed to be preserved like EIP/RIP and CS 
when a signal is delivered?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu

2015-08-13 Thread Raymond Jennings

On 08/13/15 13:09, Linus Torvalds wrote:

On Thu, Aug 13, 2015 at 1:08 PM, Cyrill Gorcunov gorcu...@gmail.com wrote:

If only I'm not missin something obvious this should not hurt us.
But I gonna build test kernel and check to be sure tomorrow, ok?

Thanks,

  Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

I am curious about what's supposed to happen normally on signal delivery.

Is SS a register that's supposed to be preserved like EIP/RIP and CS 
when a signal is delivered?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu

2015-08-13 Thread Raymond Jennings



On 08/13/15 14:46, Linus Torvalds wrote:

On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings shent...@gmail.com wrote:

I am curious about what's supposed to happen normally on signal delivery.

Is SS a register that's supposed to be preserved like EIP/RIP and CS when a
signal is delivered?

What exactly does supposed mean?
Basically, when a process/thread receives a signal, what happens to its 
registers?

So clearly, we're not supposed to save/restore it. Because reality
matters a hell of a lot more than any theoretical arguments.
So it still counts as a regression if the kernel pulls the rug out from 
under someone that was relying on undocumented or buggy behavior?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu

2015-08-13 Thread Raymond Jennings

On 08/13/15 16:18, Linus Torvalds wrote:

On Thu, Aug 13, 2015 at 4:05 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

The _only_ thing that matters is that something broke.

To clarify: things like test programs etc don't matter. Real
applications, used by real users. That's what regressions cover. If
you have a workflow that isn't just some random kernel test thing, and
you depend on it, and we break it, the kernel is supposed to fix it.

There are some (very few) exceptions.

If it's a security issue, we may not be able to fix it, because
other concerns can obviously take precedence.

Also, sometimes the reports come in way too late - if you were running
some stable distro kernel for several years, and updated, and notice a
change that happened four years ago and modern applications now rely
on the _new_ behavior, we may not be able to fix the regression any
more.

But no, it was an unintentional kernel bug and clearly just stupid
crap code, and we fixed it and now the kernel is much better and
cleaner is not a valid reason for regressions. We'll go back to the
stupid and crap code if necessary, however much that may annoy us.

For an example of the kind of things we may have to do, see commits

 64f371bc3107 autofs: make the autofsv5 packet file descriptor use
a packetized pipe
 9883035ae7ed pipes: add a packetized pipe mode for writing

and just wonder at the insanity. That's the kinds of things that
happen when one application had actively worked around a bug in
compatibility handling, and then trying to fix that bug just caused
another application to break instead.

Linus
Is there a way to temporally confine the bad crap code just to the 
applications that depend on it, or does a userspace app latching onto 
bad behavior effectively lock down the abi for the future?


I know that some features in the kernel get deprecated over a process 
(devfs for example) once userspace is given an alternative...would there 
be a process like that to deal with userspace code that is pinning a 
piece of crap in the kernel?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Clean up whitespace in vfs.txt

2015-08-10 Thread Raymond Jennings
I noticed that vfs.txt looked kinda funky, so I went ahead and
reformatted it.

Signed-off-by: Raymond Jennings
Cc: Andrew Morton 

---

diff --git a/Documentation/filesystems/vfs.txt
b/Documentation/filesystems/vfs.txt
index 5eb8456..8ddfe06 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -114,12 +114,12 @@ members are defined:
 struct file_system_type {
const char *name;
int fs_flags;
-struct dentry *(*mount) (struct file_system_type *, int,
-   const char *, void *);
-void (*kill_sb) (struct super_block *);
-struct module *owner;
-struct file_system_type * next;
-struct list_head fs_supers;
+   struct dentry *(*mount) (struct file_system_type *, int,
+   const char *, void *);
+   void (*kill_sb) (struct super_block *);
+   struct module *owner;
+   struct file_system_type * next;
+   struct list_head fs_supers;
struct lock_class_key s_lock_key;
struct lock_class_key s_umount_key;
 };
@@ -136,7 +136,7 @@ struct file_system_type {
should be shut down
 
   owner: for internal VFS use: you should initialize this to
THIS_MODULE in
-   most cases.
+   most cases.
 
   next: for internal VFS use: you should initialize this to NULL
 
@@ -145,7 +145,7 @@ struct file_system_type {
 The mount() method has the following arguments:
 
   struct file_system_type *fs_type: describes the filesystem, partly
initialized
-   by the specific filesystem code
+   by the specific filesystem code
 
   int flags: mount flags
 
@@ -182,12 +182,12 @@ and provides a fill_super() callback instead. The
generic variants are:
   mount_nodev: mount a filesystem that is not backed by a device
 
   mount_single: mount a filesystem which shares the instance between
-   all mounts
+   all mounts
 
 A fill_super() callback implementation has the following arguments:
 
   struct super_block *sb: the superblock structure. The callback
-   must initialize this properly.
+   must initialize this properly.
 
   void *data: arbitrary mount options, usually comes as an ASCII
string (see "Mount Options" section)
@@ -208,26 +208,26 @@ This describes how the VFS can manipulate the
superblock of your
 filesystem. As of kernel 2.6.22, the following members are defined:
 
 struct super_operations {
-struct inode *(*alloc_inode)(struct super_block *sb);
-void (*destroy_inode)(struct inode *);
-
-void (*dirty_inode) (struct inode *, int flags);
-int (*write_inode) (struct inode *, int);
-void (*drop_inode) (struct inode *);
-void (*delete_inode) (struct inode *);
-void (*put_super) (struct super_block *);
-int (*sync_fs)(struct super_block *sb, int wait);
-int (*freeze_fs) (struct super_block *);
-int (*unfreeze_fs) (struct super_block *);
-int (*statfs) (struct dentry *, struct kstatfs *);
-int (*remount_fs) (struct super_block *, int *, char *);
-void (*clear_inode) (struct inode *);
-void (*umount_begin) (struct super_block *);
-
-int (*show_options)(struct seq_file *, struct dentry *);
-
-ssize_t (*quota_read)(struct super_block *, int, char *,
size_t, loff_t);
-ssize_t (*quota_write)(struct super_block *, int, const char *,
size_t, loff_t);
+   struct inode *(*alloc_inode)(struct super_block *sb);
+   void (*destroy_inode)(struct inode *);
+
+   void (*dirty_inode) (struct inode *, int flags);
+   int (*write_inode) (struct inode *, int);
+   void (*drop_inode) (struct inode *);
+   void (*delete_inode) (struct inode *);
+   void (*put_super) (struct super_block *);
+   int (*sync_fs)(struct super_block *sb, int wait);
+   int (*freeze_fs) (struct super_block *);
+   int (*unfreeze_fs) (struct super_block *);
+   int (*statfs) (struct dentry *, struct kstatfs *);
+   int (*remount_fs) (struct super_block *, int *, char *);
+   void (*clear_inode) (struct inode *);
+   void (*umount_begin) (struct super_block *);
+
+   int (*show_options)(struct seq_file *, struct dentry *);
+
+   ssize_t (*quota_read)(struct super_block *, int, char *, size_t,
loff_t);
+   ssize_t (*quota_write)(struct super_block *, int, const char *,
size_t, loff_t);
int (*nr_cached_objects)(struct super_block *);
void (*free_cached_objects)(struct super_block *, int);
 };
@@ -238,14 +238,14 @@ only called from a process context (i.e. not from
an interrupt handler
 or bottom half).
 
   alloc_inode: this method is called by alloc_inode() to allocate
memory
-   for struct inode and initialize it.  If this function is not
-   defined, a simple 'struct inode' is allocated.  Normally
-   alloc_inode will be used to allocate a larger structure which
-   contains a 'struct inode' embedded within it.
+   for struct inode and 

[PATCH] Clean up whitespace in vfs.txt

2015-08-10 Thread Raymond Jennings
I noticed that vfs.txt looked kinda funky, so I went ahead and
reformatted it.

Signed-off-by: Raymond Jennings
Cc: Andrew Morton a...@linux-foundation.org

---

diff --git a/Documentation/filesystems/vfs.txt
b/Documentation/filesystems/vfs.txt
index 5eb8456..8ddfe06 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -114,12 +114,12 @@ members are defined:
 struct file_system_type {
const char *name;
int fs_flags;
-struct dentry *(*mount) (struct file_system_type *, int,
-   const char *, void *);
-void (*kill_sb) (struct super_block *);
-struct module *owner;
-struct file_system_type * next;
-struct list_head fs_supers;
+   struct dentry *(*mount) (struct file_system_type *, int,
+   const char *, void *);
+   void (*kill_sb) (struct super_block *);
+   struct module *owner;
+   struct file_system_type * next;
+   struct list_head fs_supers;
struct lock_class_key s_lock_key;
struct lock_class_key s_umount_key;
 };
@@ -136,7 +136,7 @@ struct file_system_type {
should be shut down
 
   owner: for internal VFS use: you should initialize this to
THIS_MODULE in
-   most cases.
+   most cases.
 
   next: for internal VFS use: you should initialize this to NULL
 
@@ -145,7 +145,7 @@ struct file_system_type {
 The mount() method has the following arguments:
 
   struct file_system_type *fs_type: describes the filesystem, partly
initialized
-   by the specific filesystem code
+   by the specific filesystem code
 
   int flags: mount flags
 
@@ -182,12 +182,12 @@ and provides a fill_super() callback instead. The
generic variants are:
   mount_nodev: mount a filesystem that is not backed by a device
 
   mount_single: mount a filesystem which shares the instance between
-   all mounts
+   all mounts
 
 A fill_super() callback implementation has the following arguments:
 
   struct super_block *sb: the superblock structure. The callback
-   must initialize this properly.
+   must initialize this properly.
 
   void *data: arbitrary mount options, usually comes as an ASCII
string (see Mount Options section)
@@ -208,26 +208,26 @@ This describes how the VFS can manipulate the
superblock of your
 filesystem. As of kernel 2.6.22, the following members are defined:
 
 struct super_operations {
-struct inode *(*alloc_inode)(struct super_block *sb);
-void (*destroy_inode)(struct inode *);
-
-void (*dirty_inode) (struct inode *, int flags);
-int (*write_inode) (struct inode *, int);
-void (*drop_inode) (struct inode *);
-void (*delete_inode) (struct inode *);
-void (*put_super) (struct super_block *);
-int (*sync_fs)(struct super_block *sb, int wait);
-int (*freeze_fs) (struct super_block *);
-int (*unfreeze_fs) (struct super_block *);
-int (*statfs) (struct dentry *, struct kstatfs *);
-int (*remount_fs) (struct super_block *, int *, char *);
-void (*clear_inode) (struct inode *);
-void (*umount_begin) (struct super_block *);
-
-int (*show_options)(struct seq_file *, struct dentry *);
-
-ssize_t (*quota_read)(struct super_block *, int, char *,
size_t, loff_t);
-ssize_t (*quota_write)(struct super_block *, int, const char *,
size_t, loff_t);
+   struct inode *(*alloc_inode)(struct super_block *sb);
+   void (*destroy_inode)(struct inode *);
+
+   void (*dirty_inode) (struct inode *, int flags);
+   int (*write_inode) (struct inode *, int);
+   void (*drop_inode) (struct inode *);
+   void (*delete_inode) (struct inode *);
+   void (*put_super) (struct super_block *);
+   int (*sync_fs)(struct super_block *sb, int wait);
+   int (*freeze_fs) (struct super_block *);
+   int (*unfreeze_fs) (struct super_block *);
+   int (*statfs) (struct dentry *, struct kstatfs *);
+   int (*remount_fs) (struct super_block *, int *, char *);
+   void (*clear_inode) (struct inode *);
+   void (*umount_begin) (struct super_block *);
+
+   int (*show_options)(struct seq_file *, struct dentry *);
+
+   ssize_t (*quota_read)(struct super_block *, int, char *, size_t,
loff_t);
+   ssize_t (*quota_write)(struct super_block *, int, const char *,
size_t, loff_t);
int (*nr_cached_objects)(struct super_block *);
void (*free_cached_objects)(struct super_block *, int);
 };
@@ -238,14 +238,14 @@ only called from a process context (i.e. not from
an interrupt handler
 or bottom half).
 
   alloc_inode: this method is called by alloc_inode() to allocate
memory
-   for struct inode and initialize it.  If this function is not
-   defined, a simple 'struct inode' is allocated.  Normally
-   alloc_inode will be used to allocate a larger structure which
-   contains a 'struct inode' embedded within it.
+   for struct

Re: Dealing with the NMI mess

2015-07-24 Thread Raymond Jennings
On Thu, 2015-07-23 at 13:21 -0700, Andy Lutomirski wrote:
> [moved to a new thread, cc list trimmed]
> 
> Hi all-
> 
> We've considered two approaches to dealing with NMIs:
> 
> 1. Allow nesting.  We know quite well how messy that is.

This might be a stupid question, but

1.  What exactly does the NMI handler handle
2.  Is it possible for the NMI handler to just increment a counter and
return if it nests, and let the outer handler notice and rerun itself.

> 2. Forbid IRET inside NMIs.  Doable but maybe not that pretty.
> 
> We haven't considered:
> 
> 3. Forbid faults (other than MCE) inside NMI.
> 
> Option 3 is almost easy.  There are really only two kinds of faults
> that can legitimately nest inside NMI: #PF and #DB.  #DB is easy to
> fix (e.g. with my patches or Peter's patches).
> 
> What if we went all out and forbade page faults in NMI as well.  There
> are two reasons that I can think of that we might page fault inside an
> NMI:
> 
> a) vmalloc fault.  I think Ingo already half-implemented a rework to
> eliminate vmalloc faults entirely.
> 
> b) User memory access faults.
> 
> The reason we access user state in general from an NMI is to allow
> perf to capture enough user stack data to let the tooling backtrace
> back to user space.  What if we did it differently?  Instead of
> capturing this data in NMI context, capture it in
> prepare_exit_to_usermode.  That would let us capture user state
> *correctly*, which we currently can't really do.  There's a
> never-ending series of minor bugs in which we try to guess the user
> register state from NMI context, and it sort of works.  In
> prepare_exit_to_usermode, we really truly know the user state.
> There's a race where an NMI hits during or after
> prepare_exit_to_usermode, but maybe that's okay -- just admit defeat
> in that case and don't show the user state.  (Realistically, without
> CFI data, we're not going to be guaranteed to get the right state
> anyway.)
> 
> To make this work, we'd have to teach NMI-from-userspace to call the
> callback itself.  It would look like:
> 
> prepare_exit_to_usermode() {
>   ...
>   while (blah blah blah) {
> if (cached_flags & TIF_PERF_CAPTURE_USER_STATE)
>   perf_capture_user_state();
> ...
>   }
>   ...
> }
> 
> and then, on NMI exit, we'd call perf_capture_user_state directly,
> since we don't want to enable IRQs or do opportunsitic sysret on exit
> from NMI.  (Why not?  Because NMIs are still masked, and we don't want
> to pay for double-IRET to unmask them, so we really want to leave IRQs
> off and IRET straight back to user mode.)
> 
> There's an unavoidable race in which we enter user mode with
> TIF_PERF_CAPTURE_USER_STATE still set.  In principle, we could
> IPI-to-self from the NMI handler to cover that case (mostly -- we
> capture the wrong state if we're on our way to an IRET fault), or we
> could just check on entry if the flag is still set and, if so, admit
> defeat.
> 
> Peter, can this be done without breaking the perf ABI?  If we were
> designing all of this stuff from scratch right now, I'd suggest doing
> it this way, but I'm not sure whether it makes sense to try to
> retrofit it in.
> 
> 
> If we decide to stick with option 2, then I've now convinced myself
> that banning all kernel breakpoints and watchpoints during NMI
> processing is probably for the best.  Maybe we should go one step
> farther and ban all DR7 breakpoints period.  Sure, it will slow down
> perf if there are user breakpoints or watchpoints set, but, having
> looked at the asm, returning from #DB using RET is, while doable,
> distinctly ugly.
> 
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dealing with the NMI mess

2015-07-24 Thread Raymond Jennings
On Thu, 2015-07-23 at 13:21 -0700, Andy Lutomirski wrote:
 [moved to a new thread, cc list trimmed]
 
 Hi all-
 
 We've considered two approaches to dealing with NMIs:
 
 1. Allow nesting.  We know quite well how messy that is.

This might be a stupid question, but

1.  What exactly does the NMI handler handle
2.  Is it possible for the NMI handler to just increment a counter and
return if it nests, and let the outer handler notice and rerun itself.

 2. Forbid IRET inside NMIs.  Doable but maybe not that pretty.
 
 We haven't considered:
 
 3. Forbid faults (other than MCE) inside NMI.
 
 Option 3 is almost easy.  There are really only two kinds of faults
 that can legitimately nest inside NMI: #PF and #DB.  #DB is easy to
 fix (e.g. with my patches or Peter's patches).
 
 What if we went all out and forbade page faults in NMI as well.  There
 are two reasons that I can think of that we might page fault inside an
 NMI:
 
 a) vmalloc fault.  I think Ingo already half-implemented a rework to
 eliminate vmalloc faults entirely.
 
 b) User memory access faults.
 
 The reason we access user state in general from an NMI is to allow
 perf to capture enough user stack data to let the tooling backtrace
 back to user space.  What if we did it differently?  Instead of
 capturing this data in NMI context, capture it in
 prepare_exit_to_usermode.  That would let us capture user state
 *correctly*, which we currently can't really do.  There's a
 never-ending series of minor bugs in which we try to guess the user
 register state from NMI context, and it sort of works.  In
 prepare_exit_to_usermode, we really truly know the user state.
 There's a race where an NMI hits during or after
 prepare_exit_to_usermode, but maybe that's okay -- just admit defeat
 in that case and don't show the user state.  (Realistically, without
 CFI data, we're not going to be guaranteed to get the right state
 anyway.)
 
 To make this work, we'd have to teach NMI-from-userspace to call the
 callback itself.  It would look like:
 
 prepare_exit_to_usermode() {
   ...
   while (blah blah blah) {
 if (cached_flags  TIF_PERF_CAPTURE_USER_STATE)
   perf_capture_user_state();
 ...
   }
   ...
 }
 
 and then, on NMI exit, we'd call perf_capture_user_state directly,
 since we don't want to enable IRQs or do opportunsitic sysret on exit
 from NMI.  (Why not?  Because NMIs are still masked, and we don't want
 to pay for double-IRET to unmask them, so we really want to leave IRQs
 off and IRET straight back to user mode.)
 
 There's an unavoidable race in which we enter user mode with
 TIF_PERF_CAPTURE_USER_STATE still set.  In principle, we could
 IPI-to-self from the NMI handler to cover that case (mostly -- we
 capture the wrong state if we're on our way to an IRET fault), or we
 could just check on entry if the flag is still set and, if so, admit
 defeat.
 
 Peter, can this be done without breaking the perf ABI?  If we were
 designing all of this stuff from scratch right now, I'd suggest doing
 it this way, but I'm not sure whether it makes sense to try to
 retrofit it in.
 
 
 If we decide to stick with option 2, then I've now convinced myself
 that banning all kernel breakpoints and watchpoints during NMI
 processing is probably for the best.  Maybe we should go one step
 farther and ban all DR7 breakpoints period.  Sure, it will slow down
 perf if there are user breakpoints or watchpoints set, but, having
 looked at the asm, returning from #DB using RET is, while doable,
 distinctly ugly.
 
 --Andy
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Fix detection of GCC -mpreferred-stack-boundary support

2015-07-06 Thread Raymond Jennings
On Mon, 2015-07-06 at 10:59 -0700, Andy Lutomirski wrote:
> On Mon, Jul 6, 2015 at 10:40 AM, Ingo Molnar  wrote:
> >
> > * Andy Lutomirski  wrote:
> >
> >> > My reasoning: on modern uarchs there's no penalty for 32-bit 
> >> > misalignment of
> >> > 64-bit variables, only if they cross 64-byte cache lines, which should 
> >> > be rare
> >> > with a chance of 1:16. This small penalty (of at most +1 cycle in some
> >> > circumstances IIRC) should be more than counterbalanced by the 
> >> > compression of
> >> > the stack by 5% on average.
> >>
> >> I'll counter with: what's the benefit?  There are no operations that will
> >> naturally change RSP by anything that isn't a multiple of 8 (there's no 
> >> pushl in
> >> 64-bit mode, or at least not on AMD chips -- the Intel manual is a bit 
> >> vague on
> >> this point), so we'll end up with RSP being a multiple of 8 regardless.  
> >> Even if
> >> we somehow shaved 4 bytes off in asm, that still wouldn't buy us anything, 
> >> as a
> >> dangling 4 bytes at the bottom of the stack isn't useful for anything.
> >
> > Yeah, so it might be utilized in frame-pointer less builds (which we might 
> > be able
> > to utilize in the future if sane Dwarf code comes around), which does not 
> > use
> > push/pop to manage the stack but often has patterns like:
> >
> > 8102aa90 :
> > 8102aa90:   48 83 ec 18 sub$0x18,%rsp
> >
> > and uses MOVs to manage the stack. Those kinds of stack frames could be 
> > 4-byte
> > granular as well.
> >
> > But yeah ... it's pretty marginal.
> 
> To get even that, we'd need an additional ABI-changing GCC flag to
> change GCC's idea of the alignment of long from 8 to 4.  (I just
> checked: g++ thinks that alignof(long) == 8.  I was too lazy to look
> up how to ask the equivalent question in C.)

I just want to point out that long itself is 8 bytes on 64-bit x86, but
only 4 bytes on 32-bit x86.

Perhaps we should keep in mind sizeof(long) and not just alignof(long)?

My opinion btw, is that if long is 8 bytes wide, it should also be 8
bytes aligned.

> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Fix detection of GCC -mpreferred-stack-boundary support

2015-07-06 Thread Raymond Jennings
On Mon, 2015-07-06 at 10:59 -0700, Andy Lutomirski wrote:
 On Mon, Jul 6, 2015 at 10:40 AM, Ingo Molnar mi...@kernel.org wrote:
 
  * Andy Lutomirski l...@amacapital.net wrote:
 
   My reasoning: on modern uarchs there's no penalty for 32-bit 
   misalignment of
   64-bit variables, only if they cross 64-byte cache lines, which should 
   be rare
   with a chance of 1:16. This small penalty (of at most +1 cycle in some
   circumstances IIRC) should be more than counterbalanced by the 
   compression of
   the stack by 5% on average.
 
  I'll counter with: what's the benefit?  There are no operations that will
  naturally change RSP by anything that isn't a multiple of 8 (there's no 
  pushl in
  64-bit mode, or at least not on AMD chips -- the Intel manual is a bit 
  vague on
  this point), so we'll end up with RSP being a multiple of 8 regardless.  
  Even if
  we somehow shaved 4 bytes off in asm, that still wouldn't buy us anything, 
  as a
  dangling 4 bytes at the bottom of the stack isn't useful for anything.
 
  Yeah, so it might be utilized in frame-pointer less builds (which we might 
  be able
  to utilize in the future if sane Dwarf code comes around), which does not 
  use
  push/pop to manage the stack but often has patterns like:
 
  8102aa90 SyS_getpriority:
  8102aa90:   48 83 ec 18 sub$0x18,%rsp
 
  and uses MOVs to manage the stack. Those kinds of stack frames could be 
  4-byte
  granular as well.
 
  But yeah ... it's pretty marginal.
 
 To get even that, we'd need an additional ABI-changing GCC flag to
 change GCC's idea of the alignment of long from 8 to 4.  (I just
 checked: g++ thinks that alignof(long) == 8.  I was too lazy to look
 up how to ask the equivalent question in C.)

I just want to point out that long itself is 8 bytes on 64-bit x86, but
only 4 bytes on 32-bit x86.

Perhaps we should keep in mind sizeof(long) and not just alignof(long)?

My opinion btw, is that if long is 8 bytes wide, it should also be 8
bytes aligned.

 --Andy
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] tty: fix up atime/mtime mess, take four

2015-03-06 Thread Raymond Jennings
On Fri, 2015-02-27 at 18:40 +0100, Jiri Slaby wrote:
> So check the absolute difference of times and if it large than "8
> seconds or so", always update the time. That means we will update
> immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the
> check, but it was always that way.

If I may ask, what is supposed to happen normally when you write to a
tty device?  I always thought the tty device was treated just like a
normal file wrt. timestamps.

Now I see a patch for 8 seconds something.
> 
> Thanks John for serving me this so nicely debugged.
> 
> Signed-off-by: Jiri Slaby 
> Reported-by: John Paul Perry 
> Cc: Greg Kroah-Hartman 
> Cc:  # all, as b0b885657 was backported
> Cc: Linus Torvalds 
> ---
>  drivers/tty/tty_io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index e07f35e14fa2..e31b18a6d576 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -1032,8 +1032,8 @@ EXPORT_SYMBOL(start_tty);
>  /* We limit tty time update visibility to every 8 seconds or so. */
>  static void tty_update_time(struct timespec *time)
>  {
> - unsigned long sec = get_seconds() & ~7;
> - if ((long)(sec - time->tv_sec) > 0)
> + unsigned long sec = get_seconds();
> + if (abs(sec - time->tv_sec) & ~7)
>   time->tv_sec = sec;
>  }
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] tty: fix up atime/mtime mess, take four

2015-03-06 Thread Raymond Jennings
On Fri, 2015-02-27 at 18:40 +0100, Jiri Slaby wrote:
 So check the absolute difference of times and if it large than 8
 seconds or so, always update the time. That means we will update
 immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the
 check, but it was always that way.

If I may ask, what is supposed to happen normally when you write to a
tty device?  I always thought the tty device was treated just like a
normal file wrt. timestamps.

Now I see a patch for 8 seconds something.
 
 Thanks John for serving me this so nicely debugged.
 
 Signed-off-by: Jiri Slaby jsl...@suse.cz
 Reported-by: John Paul Perry john_paul.pe...@alcatel-lucent.com
 Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
 Cc: sta...@vger.kernel.org # all, as b0b885657 was backported
 Cc: Linus Torvalds torva...@linux-foundation.org
 ---
  drivers/tty/tty_io.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
 index e07f35e14fa2..e31b18a6d576 100644
 --- a/drivers/tty/tty_io.c
 +++ b/drivers/tty/tty_io.c
 @@ -1032,8 +1032,8 @@ EXPORT_SYMBOL(start_tty);
  /* We limit tty time update visibility to every 8 seconds or so. */
  static void tty_update_time(struct timespec *time)
  {
 - unsigned long sec = get_seconds()  ~7;
 - if ((long)(sec - time-tv_sec)  0)
 + unsigned long sec = get_seconds();
 + if (abs(sec - time-tv_sec)  ~7)
   time-tv_sec = sec;
  }
  


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mmotm: swap overflow warning patch: mangled description and missing review tag

2013-07-21 Thread Raymond Jennings
I checked the mmotm queue and it seems that my mid-air corrections got
the patch mangled when it was saved to your mail queue, and in addition
to a missing correction of a typo in my testing log, Rik van Riel's
Reviewed-By tag vanished

http://www.ozlabs.org/~akpm/mmotm/broken-out/swap-warn-when-a-swap-area-overflows-the-maximum-size.patch

If you could fix my test transcript and properly credit Rik for
reviewing my patch before you ship it to linus I'd appreciate it.

The correctly formatted patch and description with corrections and tags
follows:

From: Raymond Jennings 
Subject: swap: warn when a swap area overflows the maximum size

It is possible to swapon a swap area that is too big for the pte width
to handle.

Presently this failure happens silently.

Instead, emit a diagnostic to warn the user.

Testing results, root prompt commands and kernel log messages:

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k 

# lvresize /dev/system/swap --size 64G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
using 33554432k out of 67108860k
Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 

Signed-off-by: Raymond Jennings 
Acked-by: Valdis Kletnieks 
Reviewed-by: Rik van Riel 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
---

 mm/swapfile.c |6 ++
 1 file changed, 6 insertions(+)

diff -puN
mm/swapfile.c~swap-warn-when-a-swap-area-overflows-the-maximum-size
mm/swapfile.c
---
a/mm/swapfile.c~swap-warn-when-a-swap-area-overflows-the-maximum-size
+++ a/mm/swapfile.c
@@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(st
 */
maxpages = swp_offset(pte_to_swp_entry(
swp_entry_to_pte(swp_entry(0, ~0UL + 1;
+   if (swap_header->info.last_page > maxpages) {
+   printk(KERN_WARNING
+   "Truncating oversized swap area, only using %luk out of 
%luk\n",
+   maxpages << (PAGE_SHIFT - 10),
+   swap_header->info.last_page << (PAGE_SHIFT - 10));
+   }
if (maxpages > swap_header->info.last_page) {
maxpages = swap_header->info.last_page + 1;
/* p->max is an unsigned int: don't overflow it */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


mmotm: swap overflow warning patch: mangled description and missing review tag

2013-07-21 Thread Raymond Jennings
I checked the mmotm queue and it seems that my mid-air corrections got
the patch mangled when it was saved to your mail queue, and in addition
to a missing correction of a typo in my testing log, Rik van Riel's
Reviewed-By tag vanished

http://www.ozlabs.org/~akpm/mmotm/broken-out/swap-warn-when-a-swap-area-overflows-the-maximum-size.patch

If you could fix my test transcript and properly credit Rik for
reviewing my patch before you ship it to linus I'd appreciate it.

The correctly formatted patch and description with corrections and tags
follows:

From: Raymond Jennings shent...@gmail.com
Subject: swap: warn when a swap area overflows the maximum size

It is possible to swapon a swap area that is too big for the pte width
to handle.

Presently this failure happens silently.

Instead, emit a diagnostic to warn the user.

Testing results, root prompt commands and kernel log messages:

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k 

# lvresize /dev/system/swap --size 64G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
using 33554432k out of 67108860k
Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 

Signed-off-by: Raymond Jennings shent...@gmail.com
Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu
Reviewed-by: Rik van Riel r...@redhat.com
Cc: Hugh Dickins hu...@google.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 mm/swapfile.c |6 ++
 1 file changed, 6 insertions(+)

diff -puN
mm/swapfile.c~swap-warn-when-a-swap-area-overflows-the-maximum-size
mm/swapfile.c
---
a/mm/swapfile.c~swap-warn-when-a-swap-area-overflows-the-maximum-size
+++ a/mm/swapfile.c
@@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(st
 */
maxpages = swp_offset(pte_to_swp_entry(
swp_entry_to_pte(swp_entry(0, ~0UL + 1;
+   if (swap_header-info.last_page  maxpages) {
+   printk(KERN_WARNING
+   Truncating oversized swap area, only using %luk out of 
%luk\n,
+   maxpages  (PAGE_SHIFT - 10),
+   swap_header-info.last_page  (PAGE_SHIFT - 10));
+   }
if (maxpages  swap_header-info.last_page) {
maxpages = swap_header-info.last_page + 1;
/* p-max is an unsigned int: don't overflow it */


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ATTEND] How to act on LKML (was: [ 00/19] 3.10.1-stable review)

2013-07-15 Thread Raymond Jennings
On Mon, 2013-07-15 at 15:38 -0700, Linus Torvalds wrote:
> On Mon, Jul 15, 2013 at 3:08 PM, Steven Rostedt  wrote:
> >
> > Can we please make this into a Kernel Summit discussion. I highly doubt
> > we would solve anything, but it certainly would be a fun segment to
> > watch :-)
> 
> I think we should, because I think it's the kind of thing we really
> need at the KS - talking about "process".
> 
> At the same time, I really don't know what the format would possibly
> be like for it to really work as a reasonable discussion. And I think
> that is important, because this kind of subject is *not* likely
> possible in the traditional "people sit around tables and maybe
> somebody has a few slides" format.
> 
> A small panel discussion with a few people (fiveish?) that have very
> different viewpoints, along with baskets of rotten fruit set out on
> the tables? That could be fun. And I'm serious, although we might want
> to limit the size of the fruit to smaller berries ;)
> 
> Sarah will bring the brownies.

I'm sure slashdot will be happy to follow up, seeing as how this heated
discussion just made headlines there.

http://linux.slashdot.org/story/13/07/15/2316219/kernel-dev-tells-linus-torvalds-to-stop-using-abusive-language

Personally I *like* when abusive language is used, assuming it's used
appropriately.  I *hate very much* when people are nice to me and let
their frustrations grow, only to ambush me later with a string of curses
and lashings in one fell swoop.  Not only does "holding it in" set me up
for failure becuase I remain ignorant, I also feel downright betrayed
when they come off as vindictive bastards that saved their beefs until
the moment was ripe to do the most damage.  It doesn't just make me lose
respect for them, it makes me lose trust.

Give me an honest asshole over a silver tongued backstabber any day.

>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_* used by user-space to figure out whether a feature is on/off

2013-07-15 Thread Raymond Jennings
On Mon, 2013-07-15 at 17:53 -0700, Linus Torvalds wrote:
> On Mon, Jul 15, 2013 at 5:46 PM, Raymond Jennings  wrote:
> >
> > I'd like to point out that Google Chrome also makes use of CONFIG_ tests
> > to detect support for namespaces and pid containers and stuff.
> 
> Hmm. It must work fine despite that. Because I run self-compiled
> kernels and there are no config files to be found by user apps.
> Neither in /proc nor in /boot. And I'm using chrome to write this.
> 
>   Linus

It could be the quirks of my package manager though.

I run Gentoo, so it's entirely possible that the ebuild is doing all the
bitching, but chrome itself just fails gracefully and falls back to not
using those features if it can't find them.

I imagine though that the same stuff applying to applications in general
should also apply to installers.

Anyway I just wanted to highlight that it's not just the xen stuff
that's peeking at kernel config.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_* used by user-space to figure out whether a feature is on/off

2013-07-15 Thread Raymond Jennings
On Mon, 2013-07-15 at 10:17 -0700, Linus Torvalds wrote:
> On Mon, Jul 15, 2013 at 8:40 AM, Konrad Rzeszutek Wilk
>  wrote:
> >
> > I am hoping you can help me draw an understanding and a line in sand 
> > whether:
> >  a) Tools should not depend on /proc/config.gz to figure out whether
> > a kernel has some CONFIG_X=y feature.
> 
> Well, /proc/config.gz is better than some crazy saved-off config file,
> since it at least is guaranteed to match the kernel you're running,
> but it's still a completely crazy idea. Not the least because it's not
> at all guaranteed to be there, and even if it's there, we'll rename
> config options without caring one whit. It's meant for "make
> oldconfig" style stuff, nothing more. Any user program that depends on
> it is broken by design.
> 
> >  b) If they are OK to do so, what do we do when certain CONFIG_X options
> > get reworked/removed. Would they be considered regressions? Aka
> > is this similar to 'you shall not break user-space'?
> 
> Absolutely not. If you depend on any config file, you're broken by
> definition. The only thing that can depend on the config file is the
> kernel tree itself, and even then we happily break that at any time
> (ie "make oldconfig" is meant to give an _approximation_ of the old
> config, but if some config option gets renamed, the old value is
> thrown away without question, and the new name is asked about).
> 
> > Irrespective of that, do you have any ideas of how a user-space program 
> > (say GRUB)
> > can figure out whether the configuration stanze it generates is supported by
> > the kernel.

I'd like to point out that Google Chrome also makes use of CONFIG_ tests
to detect support for namespaces and pid containers and stuff.

> If you don't want to answer this question - since this might
> > open a can of worms you prefer not to deal with - that is absolutly OK.
> 
> I think grub should stop trying to be clever. Quite frankly, from my
> own experience, grub has become too clever by half, and become harder
> to use and configure as a result. Just don't do it.
> 
> If you want to have grub Xen options for the kernel, make them grub
> options. In the grub config file. And if that option isn't there, just
> boot it as a native kernel. That had better work anyway, and is a hell
> of a lot more flexible and stable anyway. Don't try to be clever, and
> certainly don't try to parse some random config file that may or may
> not even match the kernel you're booting.
> 
>  Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/19] 3.10.1-stable review

2013-07-15 Thread Raymond Jennings
On Mon, 2013-07-15 at 12:23 -0700, Linus Torvalds wrote:
> On Mon, Jul 15, 2013 at 12:17 PM, Willy Tarreau  wrote:
> >
> > BTW, I was amazed that you managed to get him have a much softer tone inr
> > his last e-mail, you probably found a weakness here in his management
> > process :-)
> 
> Hey, I _like_ arguing, and "cursing" and "arguing" are actually not at
> all the same thing.
> 
> And I really don't tend to curse unless people are doing something
> stupid and annoying. If people have concerns and questions that I feel
> are valid, I'm more than happy to talk about it.
> 
> I curse when there isn't any argument. The cursing happens for the
> "you're so f*cking wrong that it's not even worth trying to make
> logical arguments about it, because you have no possible excuse" case.
> 
> .. and sometimes people surprise me and come back with a valid excuse
> after all. "My whole family died in a tragic freak accident and my
> pony got cancer, and I was distracted".

...At least with the recent SCOTUS ruling, if you took your pony to a
vet you wouldn't have to worry about Hasbro suing him for patent
infringement...

> And then I might even tell them I'm sorry.
> 
> No. Not really.
> 
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ 00/19] 3.10.1-stable review

2013-07-15 Thread Raymond Jennings
On Mon, 2013-07-15 at 12:23 -0700, Linus Torvalds wrote:
 On Mon, Jul 15, 2013 at 12:17 PM, Willy Tarreau w...@1wt.eu wrote:
 
  BTW, I was amazed that you managed to get him have a much softer tone inr
  his last e-mail, you probably found a weakness here in his management
  process :-)
 
 Hey, I _like_ arguing, and cursing and arguing are actually not at
 all the same thing.
 
 And I really don't tend to curse unless people are doing something
 stupid and annoying. If people have concerns and questions that I feel
 are valid, I'm more than happy to talk about it.
 
 I curse when there isn't any argument. The cursing happens for the
 you're so f*cking wrong that it's not even worth trying to make
 logical arguments about it, because you have no possible excuse case.
 
 .. and sometimes people surprise me and come back with a valid excuse
 after all. My whole family died in a tragic freak accident and my
 pony got cancer, and I was distracted.

...At least with the recent SCOTUS ruling, if you took your pony to a
vet you wouldn't have to worry about Hasbro suing him for patent
infringement...

 And then I might even tell them I'm sorry.
 
 No. Not really.
 
Linus
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_* used by user-space to figure out whether a feature is on/off

2013-07-15 Thread Raymond Jennings
On Mon, 2013-07-15 at 10:17 -0700, Linus Torvalds wrote:
 On Mon, Jul 15, 2013 at 8:40 AM, Konrad Rzeszutek Wilk
 konrad.w...@oracle.com wrote:
 
  I am hoping you can help me draw an understanding and a line in sand 
  whether:
   a) Tools should not depend on /proc/config.gz to figure out whether
  a kernel has some CONFIG_X=y feature.
 
 Well, /proc/config.gz is better than some crazy saved-off config file,
 since it at least is guaranteed to match the kernel you're running,
 but it's still a completely crazy idea. Not the least because it's not
 at all guaranteed to be there, and even if it's there, we'll rename
 config options without caring one whit. It's meant for make
 oldconfig style stuff, nothing more. Any user program that depends on
 it is broken by design.
 
   b) If they are OK to do so, what do we do when certain CONFIG_X options
  get reworked/removed. Would they be considered regressions? Aka
  is this similar to 'you shall not break user-space'?
 
 Absolutely not. If you depend on any config file, you're broken by
 definition. The only thing that can depend on the config file is the
 kernel tree itself, and even then we happily break that at any time
 (ie make oldconfig is meant to give an _approximation_ of the old
 config, but if some config option gets renamed, the old value is
 thrown away without question, and the new name is asked about).
 
  Irrespective of that, do you have any ideas of how a user-space program 
  (say GRUB)
  can figure out whether the configuration stanze it generates is supported by
  the kernel.

I'd like to point out that Google Chrome also makes use of CONFIG_ tests
to detect support for namespaces and pid containers and stuff.

 If you don't want to answer this question - since this might
  open a can of worms you prefer not to deal with - that is absolutly OK.
 
 I think grub should stop trying to be clever. Quite frankly, from my
 own experience, grub has become too clever by half, and become harder
 to use and configure as a result. Just don't do it.
 
 If you want to have grub Xen options for the kernel, make them grub
 options. In the grub config file. And if that option isn't there, just
 boot it as a native kernel. That had better work anyway, and is a hell
 of a lot more flexible and stable anyway. Don't try to be clever, and
 certainly don't try to parse some random config file that may or may
 not even match the kernel you're booting.
 
  Linus
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CONFIG_* used by user-space to figure out whether a feature is on/off

2013-07-15 Thread Raymond Jennings
On Mon, 2013-07-15 at 17:53 -0700, Linus Torvalds wrote:
 On Mon, Jul 15, 2013 at 5:46 PM, Raymond Jennings shent...@gmail.com wrote:
 
  I'd like to point out that Google Chrome also makes use of CONFIG_ tests
  to detect support for namespaces and pid containers and stuff.
 
 Hmm. It must work fine despite that. Because I run self-compiled
 kernels and there are no config files to be found by user apps.
 Neither in /proc nor in /boot. And I'm using chrome to write this.
 
   Linus

It could be the quirks of my package manager though.

I run Gentoo, so it's entirely possible that the ebuild is doing all the
bitching, but chrome itself just fails gracefully and falls back to not
using those features if it can't find them.

I imagine though that the same stuff applying to applications in general
should also apply to installers.

Anyway I just wanted to highlight that it's not just the xen stuff
that's peeking at kernel config.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ATTEND] How to act on LKML (was: [ 00/19] 3.10.1-stable review)

2013-07-15 Thread Raymond Jennings
On Mon, 2013-07-15 at 15:38 -0700, Linus Torvalds wrote:
 On Mon, Jul 15, 2013 at 3:08 PM, Steven Rostedt rost...@goodmis.org wrote:
 
  Can we please make this into a Kernel Summit discussion. I highly doubt
  we would solve anything, but it certainly would be a fun segment to
  watch :-)
 
 I think we should, because I think it's the kind of thing we really
 need at the KS - talking about process.
 
 At the same time, I really don't know what the format would possibly
 be like for it to really work as a reasonable discussion. And I think
 that is important, because this kind of subject is *not* likely
 possible in the traditional people sit around tables and maybe
 somebody has a few slides format.
 
 A small panel discussion with a few people (fiveish?) that have very
 different viewpoints, along with baskets of rotten fruit set out on
 the tables? That could be fun. And I'm serious, although we might want
 to limit the size of the fruit to smaller berries ;)
 
 Sarah will bring the brownies.

I'm sure slashdot will be happy to follow up, seeing as how this heated
discussion just made headlines there.

http://linux.slashdot.org/story/13/07/15/2316219/kernel-dev-tells-linus-torvalds-to-stop-using-abusive-language

Personally I *like* when abusive language is used, assuming it's used
appropriately.  I *hate very much* when people are nice to me and let
their frustrations grow, only to ambush me later with a string of curses
and lashings in one fell swoop.  Not only does holding it in set me up
for failure becuase I remain ignorant, I also feel downright betrayed
when they come off as vindictive bastards that saved their beefs until
the moment was ripe to do the most damage.  It doesn't just make me lose
respect for them, it makes me lose trust.

Give me an honest asshole over a silver tongued backstabber any day.

Linus
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] swap: warn when a swap area overflows the maximum size

2013-07-07 Thread Raymond Jennings
Screwed up and didn't attach my fixed test log to the second version.

See below.

On Sun, 2013-07-07 at 15:31 -0400, Rik van Riel wrote:
> On 07/07/2013 03:13 PM, Raymond Jennings wrote:
> > Turned the comparison around for clarity of "bigger than"
> >
> > No semantic changes, if it still compiles it should do the same thing so
> > I've omitted the testing this time.  Will be happy to retest if required
> > but I'm on an atom 330 and kernel rebuilds are a nightmare.
> 
> Added CC: Andrew Morton, since this should probably go into -mm :)
> 
> > 
> >
> > swap: warn when a swap area overflows the maximum size
> >
> > It is possible to swapon a swap area that is too big for the pte width
> > to handle.
> >
> > Presently this failure happens silently.
> >
> > Instead, emit a diagnostic to warn the user.
> >
> > Signed-off-by: Raymond Jennings 
> > Acked-by: Valdis Kletnieks 
> 
> Reviewed-by: Rik van Riel 
> 
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 36af6ee..5a4ce53 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct
> > swap_info_struct *p,
> >   */
> >  maxpages = swp_offset(pte_to_swp_entry(
> >  swp_entry_to_pte(swp_entry(0, ~0UL + 1;
> > +   if (swap_header->info.last_page > maxpages) {
> > +   printk(KERN_WARNING
> > +  "Truncating oversized swap area, only using %luk
> > out of %luk
> > \n",
> > +  maxpages << (PAGE_SHIFT - 10),
> > +  swap_header->info.last_page << (PAGE_SHIFT -
> > 10));
> > +   }
> >  if (maxpages > swap_header->info.last_page) {
> >  maxpages = swap_header->info.last_page + 1;
> >  /* p->max is an unsigned int: don't overflow it */
> >
> > 
> >
> > Testing results, root prompt commands and kernel log messages:
> >
> > # lvresize /dev/system/swap --size 16G
> > # mkswap /dev/system/swap
> > # swapon /dev/system/swap
> >
> > Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
> > on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k
> >
> > # lvresize /dev/system/swap --size 16G

On Sun, 2013-07-07 at 04:52 -0700, Raymond Jennings wrote:
> # lvresize /dev/system/swap --size 16G

Typo in the second test.

The first line should read:

# lvresize /dev/system/swap --size 64G

First ever serious patch, got excited and burned the copypasta.

> # mkswap /dev/system/swap
> # swapon /dev/system/swap

> > # mkswap /dev/system/swap
> > # swapon /dev/system/swap
> >
> > Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
> > using 33554432k out of 67108860k
> > Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
> > on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k
> >
> >
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] swap: warn when a swap area overflows the maximum size

2013-07-07 Thread Raymond Jennings
Turned the comparison around for clarity of "bigger than"

No semantic changes, if it still compiles it should do the same thing so
I've omitted the testing this time.  Will be happy to retest if required
but I'm on an atom 330 and kernel rebuilds are a nightmare.



swap: warn when a swap area overflows the maximum size

It is possible to swapon a swap area that is too big for the pte width
to handle.

Presently this failure happens silently.

Instead, emit a diagnostic to warn the user.

Signed-off-by: Raymond Jennings 
Acked-by: Valdis Kletnieks 



diff --git a/mm/swapfile.c b/mm/swapfile.c
index 36af6ee..5a4ce53 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct
swap_info_struct *p,
 */
maxpages = swp_offset(pte_to_swp_entry(
swp_entry_to_pte(swp_entry(0, ~0UL + 1;
+   if (swap_header->info.last_page > maxpages) {
+   printk(KERN_WARNING
+  "Truncating oversized swap area, only using %luk
out of %luk
\n",
+  maxpages << (PAGE_SHIFT - 10),
+  swap_header->info.last_page << (PAGE_SHIFT -
10));
+   }
if (maxpages > swap_header->info.last_page) {
maxpages = swap_header->info.last_page + 1;
/* p->max is an unsigned int: don't overflow it */



Testing results, root prompt commands and kernel log messages:

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k 

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
using 33554432k out of 67108860k
Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swap: warn when a swap area overflows the maximum size (resent)

2013-07-07 Thread Raymond Jennings
...I hate you gmail...

On Sun, 2013-07-07 at 04:52 -0700, Raymond Jennings wrote:
> # lvresize /dev/system/swap --size 16G

Typo in the second test.

The first line should read:

# lvresize /dev/system/swap --size 64G

First ever serious patch, got excited and burned the copypasta.

> # mkswap /dev/system/swap
> # swapon /dev/system/swap
> 
> Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
> using 33554432k out of 67108860k
> Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
> on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] swap: warn when a swap area overflows the maximum size (resent)

2013-07-07 Thread Raymond Jennings
Silly me, wrong email address

On Sun, 2013-07-07 at 04:44 -0700, Raymond Jennings wrote:
swap: warn when a swap area overflows the maximum size

It is possible to swapon a swap area that is too big for the pte width
to handle.

Presently this failure happens silently.

Instead, emit a diagnostic to warn the user.

Signed-off-by: Raymond Jennings 
Acked-by: Valdis Kletnieks 



diff --git a/mm/swapfile.c b/mm/swapfile.c
index 36af6ee..5a4ce53 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct
swap_info_struct *p,
 */
maxpages = swp_offset(pte_to_swp_entry(
swp_entry_to_pte(swp_entry(0, ~0UL + 1;
+   if (maxpages < swap_header->info.last_page) {
+   printk(KERN_WARNING
+  "Truncating oversized swap area, only using %luk out of 
%luk
\n",
+  maxpages << (PAGE_SHIFT - 10),
+  swap_header->info.last_page << (PAGE_SHIFT - 10));
+   }
if (maxpages > swap_header->info.last_page) {
maxpages = swap_header->info.last_page + 1;
/* p->max is an unsigned int: don't overflow it */



Testing results, root prompt commands and kernel log messages:

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k 

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
using 33554432k out of 67108860k
Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] swap: warn when a swap area overflows the maximum size

2013-07-07 Thread Raymond Jennings
swap: warn when a swap area overflows the maximum size

It is possible to swapon a swap area that is too big for the pte width
to handle.

Presently this failure happens silently.

Instead, emit a diagnostic to warn the user.

Signed-off-by: Raymond Jennings 
Acked-by: Valdis Kletnieks 



diff --git a/mm/swapfile.c b/mm/swapfile.c
index 36af6ee..5a4ce53 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct
swap_info_struct *p,
 */
maxpages = swp_offset(pte_to_swp_entry(
swp_entry_to_pte(swp_entry(0, ~0UL + 1;
+   if (maxpages < swap_header->info.last_page) {
+   printk(KERN_WARNING
+  "Truncating oversized swap area, only using %luk out of 
%luk
\n",
+  maxpages << (PAGE_SHIFT - 10),
+  swap_header->info.last_page << (PAGE_SHIFT - 10));
+   }
if (maxpages > swap_header->info.last_page) {
maxpages = swap_header->info.last_page + 1;
/* p->max is an unsigned int: don't overflow it */



Testing results, root prompt commands and kernel log messages:

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k 

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
using 33554432k out of 67108860k
Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] swap: warn when a swap area overflows the maximum size

2013-07-07 Thread Raymond Jennings
swap: warn when a swap area overflows the maximum size

It is possible to swapon a swap area that is too big for the pte width
to handle.

Presently this failure happens silently.

Instead, emit a diagnostic to warn the user.

Signed-off-by: Raymond Jennings shent...@gmail.com
Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu



diff --git a/mm/swapfile.c b/mm/swapfile.c
index 36af6ee..5a4ce53 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct
swap_info_struct *p,
 */
maxpages = swp_offset(pte_to_swp_entry(
swp_entry_to_pte(swp_entry(0, ~0UL + 1;
+   if (maxpages  swap_header-info.last_page) {
+   printk(KERN_WARNING
+  Truncating oversized swap area, only using %luk out of 
%luk
\n,
+  maxpages  (PAGE_SHIFT - 10),
+  swap_header-info.last_page  (PAGE_SHIFT - 10));
+   }
if (maxpages  swap_header-info.last_page) {
maxpages = swap_header-info.last_page + 1;
/* p-max is an unsigned int: don't overflow it */



Testing results, root prompt commands and kernel log messages:

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k 

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
using 33554432k out of 67108860k
Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] swap: warn when a swap area overflows the maximum size (resent)

2013-07-07 Thread Raymond Jennings
Silly me, wrong email address

On Sun, 2013-07-07 at 04:44 -0700, Raymond Jennings wrote:
swap: warn when a swap area overflows the maximum size

It is possible to swapon a swap area that is too big for the pte width
to handle.

Presently this failure happens silently.

Instead, emit a diagnostic to warn the user.

Signed-off-by: Raymond Jennings shent...@gmail.com
Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu



diff --git a/mm/swapfile.c b/mm/swapfile.c
index 36af6ee..5a4ce53 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct
swap_info_struct *p,
 */
maxpages = swp_offset(pte_to_swp_entry(
swp_entry_to_pte(swp_entry(0, ~0UL + 1;
+   if (maxpages  swap_header-info.last_page) {
+   printk(KERN_WARNING
+  Truncating oversized swap area, only using %luk out of 
%luk
\n,
+  maxpages  (PAGE_SHIFT - 10),
+  swap_header-info.last_page  (PAGE_SHIFT - 10));
+   }
if (maxpages  swap_header-info.last_page) {
maxpages = swap_header-info.last_page + 1;
/* p-max is an unsigned int: don't overflow it */



Testing results, root prompt commands and kernel log messages:

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k 

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
using 33554432k out of 67108860k
Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] swap: warn when a swap area overflows the maximum size (resent)

2013-07-07 Thread Raymond Jennings
...I hate you gmail...

On Sun, 2013-07-07 at 04:52 -0700, Raymond Jennings wrote:
 # lvresize /dev/system/swap --size 16G

Typo in the second test.

The first line should read:

# lvresize /dev/system/swap --size 64G

First ever serious patch, got excited and burned the copypasta.

 # mkswap /dev/system/swap
 # swapon /dev/system/swap
 
 Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
 using 33554432k out of 67108860k
 Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
 on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] swap: warn when a swap area overflows the maximum size

2013-07-07 Thread Raymond Jennings
Turned the comparison around for clarity of bigger than

No semantic changes, if it still compiles it should do the same thing so
I've omitted the testing this time.  Will be happy to retest if required
but I'm on an atom 330 and kernel rebuilds are a nightmare.



swap: warn when a swap area overflows the maximum size

It is possible to swapon a swap area that is too big for the pte width
to handle.

Presently this failure happens silently.

Instead, emit a diagnostic to warn the user.

Signed-off-by: Raymond Jennings shent...@gmail.com
Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu



diff --git a/mm/swapfile.c b/mm/swapfile.c
index 36af6ee..5a4ce53 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct
swap_info_struct *p,
 */
maxpages = swp_offset(pte_to_swp_entry(
swp_entry_to_pte(swp_entry(0, ~0UL + 1;
+   if (swap_header-info.last_page  maxpages) {
+   printk(KERN_WARNING
+  Truncating oversized swap area, only using %luk
out of %luk
\n,
+  maxpages  (PAGE_SHIFT - 10),
+  swap_header-info.last_page  (PAGE_SHIFT -
10));
+   }
if (maxpages  swap_header-info.last_page) {
maxpages = swap_header-info.last_page + 1;
/* p-max is an unsigned int: don't overflow it */



Testing results, root prompt commands and kernel log messages:

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k 

# lvresize /dev/system/swap --size 16G
# mkswap /dev/system/swap
# swapon /dev/system/swap

Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
using 33554432k out of 67108860k
Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] swap: warn when a swap area overflows the maximum size

2013-07-07 Thread Raymond Jennings
Screwed up and didn't attach my fixed test log to the second version.

See below.

On Sun, 2013-07-07 at 15:31 -0400, Rik van Riel wrote:
 On 07/07/2013 03:13 PM, Raymond Jennings wrote:
  Turned the comparison around for clarity of bigger than
 
  No semantic changes, if it still compiles it should do the same thing so
  I've omitted the testing this time.  Will be happy to retest if required
  but I'm on an atom 330 and kernel rebuilds are a nightmare.
 
 Added CC: Andrew Morton, since this should probably go into -mm :)
 
  
 
  swap: warn when a swap area overflows the maximum size
 
  It is possible to swapon a swap area that is too big for the pte width
  to handle.
 
  Presently this failure happens silently.
 
  Instead, emit a diagnostic to warn the user.
 
  Signed-off-by: Raymond Jennings shent...@gmail.com
  Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu
 
 Reviewed-by: Rik van Riel r...@redhat.com
 
  diff --git a/mm/swapfile.c b/mm/swapfile.c
  index 36af6ee..5a4ce53 100644
  --- a/mm/swapfile.c
  +++ b/mm/swapfile.c
  @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct
  swap_info_struct *p,
*/
   maxpages = swp_offset(pte_to_swp_entry(
   swp_entry_to_pte(swp_entry(0, ~0UL + 1;
  +   if (swap_header-info.last_page  maxpages) {
  +   printk(KERN_WARNING
  +  Truncating oversized swap area, only using %luk
  out of %luk
  \n,
  +  maxpages  (PAGE_SHIFT - 10),
  +  swap_header-info.last_page  (PAGE_SHIFT -
  10));
  +   }
   if (maxpages  swap_header-info.last_page) {
   maxpages = swap_header-info.last_page + 1;
   /* p-max is an unsigned int: don't overflow it */
 
  
 
  Testing results, root prompt commands and kernel log messages:
 
  # lvresize /dev/system/swap --size 16G
  # mkswap /dev/system/swap
  # swapon /dev/system/swap
 
  Jul  7 04:27:22 warfang kernel: Adding 16777212k swap
  on /dev/mapper/system-swap.  Priority:-1 extents:1 across:16777212k
 
  # lvresize /dev/system/swap --size 16G

On Sun, 2013-07-07 at 04:52 -0700, Raymond Jennings wrote:
 # lvresize /dev/system/swap --size 16G

Typo in the second test.

The first line should read:

# lvresize /dev/system/swap --size 64G

First ever serious patch, got excited and burned the copypasta.

 # mkswap /dev/system/swap
 # swapon /dev/system/swap

  # mkswap /dev/system/swap
  # swapon /dev/system/swap
 
  Jul  7 04:27:22 warfang kernel: Truncating oversized swap area, only
  using 33554432k out of 67108860k
  Jul  7 04:27:22 warfang kernel: Adding 33554428k swap
  on /dev/mapper/system-swap.  Priority:-1 extents:1 across:33554428k
 
 
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [problem?] swapon: swap partition/volume size capped at 32G?

2013-07-04 Thread Raymond Jennings
On Wed, 2013-07-03 at 14:21 -0700, Raymond Jennings wrote:
> Ok, so I just upgraded to 3.10.0 (gentoo system) and made a nice big
> 64GiB swap volume on lvm as usual.
> 
> Suddenly, swapon doesn't recognize more than 32GiB, as top lists only
> that much swap space.
> 
> swapon using *two* separate 32GiB partitions works fine, but for some
> reason a swap partition bigger than 32GiB isn't fully recognized.
> 
> Previous kernel versions IIRC recognized the entire swap partition.
> 
> Is something wrong or is this new behavior standard?
> 
Hold on a minute, I just found out something ate my kernel config and
turned off PAE when I upgraded.

This in turn shrunk my pte's from 64 bits to 32 bits and is probably
what killed >32G swap extents.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [problem?] swapon: swap partition/volume size capped at 32G?

2013-07-04 Thread Raymond Jennings
On Wed, 2013-07-03 at 14:21 -0700, Raymond Jennings wrote:
 Ok, so I just upgraded to 3.10.0 (gentoo system) and made a nice big
 64GiB swap volume on lvm as usual.
 
 Suddenly, swapon doesn't recognize more than 32GiB, as top lists only
 that much swap space.
 
 swapon using *two* separate 32GiB partitions works fine, but for some
 reason a swap partition bigger than 32GiB isn't fully recognized.
 
 Previous kernel versions IIRC recognized the entire swap partition.
 
 Is something wrong or is this new behavior standard?
 
Hold on a minute, I just found out something ate my kernel config and
turned off PAE when I upgraded.

This in turn shrunk my pte's from 64 bits to 32 bits and is probably
what killed 32G swap extents.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[problem?] swapon: swap partition/volume size capped at 32G?

2013-07-03 Thread Raymond Jennings
Ok, so I just upgraded to 3.10.0 (gentoo system) and made a nice big
64GiB swap volume on lvm as usual.

Suddenly, swapon doesn't recognize more than 32GiB, as top lists only
that much swap space.

swapon using *two* separate 32GiB partitions works fine, but for some
reason a swap partition bigger than 32GiB isn't fully recognized.

Previous kernel versions IIRC recognized the entire swap partition.

Is something wrong or is this new behavior standard?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[problem?] swapon: swap partition/volume size capped at 32G?

2013-07-03 Thread Raymond Jennings
Ok, so I just upgraded to 3.10.0 (gentoo system) and made a nice big
64GiB swap volume on lvm as usual.

Suddenly, swapon doesn't recognize more than 32GiB, as top lists only
that much swap space.

swapon using *two* separate 32GiB partitions works fine, but for some
reason a swap partition bigger than 32GiB isn't fully recognized.

Previous kernel versions IIRC recognized the entire swap partition.

Is something wrong or is this new behavior standard?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC ticketlock] Auto-queued ticketlock

2013-06-12 Thread Raymond Jennings
On Wed, 2013-06-12 at 13:26 -0700, Linus Torvalds wrote:
> On Wed, Jun 12, 2013 at 1:03 PM, Davidlohr Bueso  
> wrote:
> >
> > According to him:
> >
> > "the short workload calls security functions like getpwnam(),
> > getpwuid(), getgrgid() a couple of times. These functions open
> > the /etc/passwd or /etc/group files, read their content and close the
> > files.
> 
> Ahh, ok. So yeah, it's multiple threads all hitting the same file

If that's the case and it's a bunch of reads, shouldn't they act
concurrently anyway?

I mean it's not like dentries are being changed or added or removed in
this case.

> I guess that /etc/passwd case is historically interesting, but I'm not
> sure we really want to care too deeply..
> 
> > I did a quick attempt at this (patch attached).
> 
> Yeah, that's wrong, although it probably approximates the dget() case
> (but incorrectly).
> 
> One of the points behind using an atomic d_count is that then dput() should do
> 
>if (!atomic_dec_and_lock(>d_count, >d_count))
>   return;
> 
> at the very top of the function. It can avoid taking the lock entirely
> if the count doesn't go down to zero, which would be a common case if
> you have lots of users opening the same file. While still protecting
> d_count from ever going to zero while the lock is held.
> 
> Your
> 
> +   if (atomic_read(>d_count) > 1) {
> +   atomic_dec(>d_count);
> +   return;
> +   }
> +   spin_lock(>d_lock);
> 
> pattern is fundamentally racy, but it's what "atomic_dec_and_lock()"
> should do race-free.
> 
> For similar reasons, I think you need to still maintain the d_lock in
> d_prune_aliases etc. That's a slow-path, so the fact that we add an
> atomic sequence there doesn't much matter.
> 
> However, one optimization missing from your patch is obvious in the
> profile. "dget_parent()" also needs to be optimized - you still have
> that as 99% of the spin-lock case. I think we could do something like
> 
>rcu_read_lock();
>parent = ACCESS_ONCE(dentry->d_parent);
>if (atomic_inc_nonzero(>d_count))
>   return parent;
>.. get d_lock and do it the slow way ...
>rcu_read_unlock();
> 
> to locklessly get the parent pointer. We know "parent" isn't going
> away (dentries are rcu-free'd and we hold the rcu read lock), and I
> think that we can optimistically take *any* parent dentry that
> happened to be valid at one point. As long as the refcount didn't go
> down to zero. Al?
> 
> With dput and dget_parent() both being lockless for the common case,
> you might get rid of the d_lock contention entirely for that load. I
> dunno. And I should really think more about that dget_parent() thing a
> bit more, but I cannot imagine how it could not be right (because even
> with the current d_lock model, the lock is gotten *within*
> dget_parent(), so the caller can never know if it gets a new or an old
> parent, so there is no higher-level serialization going on - and we
> might as well return *either* the new or the old as such).
> 
> I really want Al to double-check me if we decide to try going down
> this hole. But the above two fixes to your patch should at least
> approximate the d_lock changes, even if I'd have to look more closely
> at the other details of your patch..
> 
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC ticketlock] Auto-queued ticketlock

2013-06-12 Thread Raymond Jennings
On Wed, 2013-06-12 at 13:26 -0700, Linus Torvalds wrote:
 On Wed, Jun 12, 2013 at 1:03 PM, Davidlohr Bueso davidlohr.bu...@hp.com 
 wrote:
 
  According to him:
 
  the short workload calls security functions like getpwnam(),
  getpwuid(), getgrgid() a couple of times. These functions open
  the /etc/passwd or /etc/group files, read their content and close the
  files.
 
 Ahh, ok. So yeah, it's multiple threads all hitting the same file

If that's the case and it's a bunch of reads, shouldn't they act
concurrently anyway?

I mean it's not like dentries are being changed or added or removed in
this case.

 I guess that /etc/passwd case is historically interesting, but I'm not
 sure we really want to care too deeply..
 
  I did a quick attempt at this (patch attached).
 
 Yeah, that's wrong, although it probably approximates the dget() case
 (but incorrectly).
 
 One of the points behind using an atomic d_count is that then dput() should do
 
if (!atomic_dec_and_lock(dentry-d_count, dentry-d_count))
   return;
 
 at the very top of the function. It can avoid taking the lock entirely
 if the count doesn't go down to zero, which would be a common case if
 you have lots of users opening the same file. While still protecting
 d_count from ever going to zero while the lock is held.
 
 Your
 
 +   if (atomic_read(dentry-d_count)  1) {
 +   atomic_dec(dentry-d_count);
 +   return;
 +   }
 +   spin_lock(dentry-d_lock);
 
 pattern is fundamentally racy, but it's what atomic_dec_and_lock()
 should do race-free.
 
 For similar reasons, I think you need to still maintain the d_lock in
 d_prune_aliases etc. That's a slow-path, so the fact that we add an
 atomic sequence there doesn't much matter.
 
 However, one optimization missing from your patch is obvious in the
 profile. dget_parent() also needs to be optimized - you still have
 that as 99% of the spin-lock case. I think we could do something like
 
rcu_read_lock();
parent = ACCESS_ONCE(dentry-d_parent);
if (atomic_inc_nonzero(parent-d_count))
   return parent;
.. get d_lock and do it the slow way ...
rcu_read_unlock();
 
 to locklessly get the parent pointer. We know parent isn't going
 away (dentries are rcu-free'd and we hold the rcu read lock), and I
 think that we can optimistically take *any* parent dentry that
 happened to be valid at one point. As long as the refcount didn't go
 down to zero. Al?
 
 With dput and dget_parent() both being lockless for the common case,
 you might get rid of the d_lock contention entirely for that load. I
 dunno. And I should really think more about that dget_parent() thing a
 bit more, but I cannot imagine how it could not be right (because even
 with the current d_lock model, the lock is gotten *within*
 dget_parent(), so the caller can never know if it gets a new or an old
 parent, so there is no higher-level serialization going on - and we
 might as well return *either* the new or the old as such).
 
 I really want Al to double-check me if we decide to try going down
 this hole. But the above two fixes to your patch should at least
 approximate the d_lock changes, even if I'd have to look more closely
 at the other details of your patch..
 
 Linus
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Yet another pipe related oops.

2013-03-27 Thread Raymond Jennings
On Wed, Mar 27, 2013 at 9:33 AM, Linus Torvalds
 wrote:
> On Wed, Mar 27, 2013 at 8:20 AM, Al Viro  wrote:
>>
>> Actually, that's my fault - check lost in patch reordering.  My apologies ;-/
>> Eventually, we want that in fs/splice.c side of things (no point repeating it
>> for every buffer, after all), but for now this is the obvious minimal fix.
>
> Applied.
>
> Do we actually have files with NULL f_ops pointers? Should we? What
> could we possibly do with a file descriptor that doesn't have any
> fops?

For the sake of the curious including myself:

How would such a NULL f_ops file get created in the first place?

> Also, perhaps we should do something more akin to what we do for
> dentry functions where we validate them on registration, and we could
> fix up or validate read/write pointers, with semantics something like
>
> if (!fop->write)
> fop->write = fop->aio_write ? do_sync_write : EINVAL_write;
> if (!fop->read)
> fop->read = fop->aio_read ? do_sync_read : EINVAL_read;
>
> kind of things?
>
> Not a big deal, perhaps.
>
>   Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Yet another pipe related oops.

2013-03-27 Thread Raymond Jennings
On Wed, Mar 27, 2013 at 9:33 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Wed, Mar 27, 2013 at 8:20 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 Actually, that's my fault - check lost in patch reordering.  My apologies ;-/
 Eventually, we want that in fs/splice.c side of things (no point repeating it
 for every buffer, after all), but for now this is the obvious minimal fix.

 Applied.

 Do we actually have files with NULL f_ops pointers? Should we? What
 could we possibly do with a file descriptor that doesn't have any
 fops?

For the sake of the curious including myself:

How would such a NULL f_ops file get created in the first place?

 Also, perhaps we should do something more akin to what we do for
 dentry functions where we validate them on registration, and we could
 fix up or validate read/write pointers, with semantics something like

 if (!fop-write)
 fop-write = fop-aio_write ? do_sync_write : EINVAL_write;
 if (!fop-read)
 fop-read = fop-aio_read ? do_sync_read : EINVAL_read;

 kind of things?

 Not a big deal, perhaps.

   Linus
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] late arch/metag fixes for v3.9-rc1

2013-03-02 Thread Raymond Jennings
On Sat, Mar 2, 2013 at 10:10 AM, Borislav Petkov  wrote:
> On Sat, Mar 02, 2013 at 08:28:56AM -0800, Linus Torvalds wrote:
>> On Sat, Mar 2, 2013 at 2:22 AM, James Hogan  wrote:
>> >
>> > Okay, I've rebased the arch/metag tree onto mainline to make all the
>> > back-merges unnecessary and applied those simple fixes into "Build
>> > infrastructure" and "Various other headers" commits (additionally
>> > trivially removing ARCH_NO_VIRT_TO_BUS which is also now unnecessary).
>>
>> No, this is *exactly* the wrong thing to do.
>
> 
>
> Hmm, so this comes up almost everytime new maintainers send stuff (and
> when seasoned maintainers forget :)), maybe we should hold it down
> somewhere in Documentation/ for future reference?

Hear hear!

Come to think of it given how often Linus has bitched about rebasing
and back merging I'm surprised it's not already mentioned.

> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] late arch/metag fixes for v3.9-rc1

2013-03-02 Thread Raymond Jennings
On Sat, Mar 2, 2013 at 10:10 AM, Borislav Petkov b...@alien8.de wrote:
 On Sat, Mar 02, 2013 at 08:28:56AM -0800, Linus Torvalds wrote:
 On Sat, Mar 2, 2013 at 2:22 AM, James Hogan james.ho...@imgtec.com wrote:
 
  Okay, I've rebased the arch/metag tree onto mainline to make all the
  back-merges unnecessary and applied those simple fixes into Build
  infrastructure and Various other headers commits (additionally
  trivially removing ARCH_NO_VIRT_TO_BUS which is also now unnecessary).

 No, this is *exactly* the wrong thing to do.

 snip good practices and musings about maintainer trees

 Hmm, so this comes up almost everytime new maintainers send stuff (and
 when seasoned maintainers forget :)), maybe we should hold it down
 somewhere in Documentation/ for future reference?

Hear hear!

Come to think of it given how often Linus has bitched about rebasing
and back merging I'm surprised it's not already mentioned.

 --
 Regards/Gruss,
 Boris.

 Sent from a fat crate under my desk. Formatting is fine.
 --
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Load keys from signed PE binaries

2013-02-26 Thread Raymond Jennings
My two cents on this subject btw is that anything to do with
Microsoft's intentions or plans is an issue of policy that belongs
entirely in userspace.

"mechanism, not policy"

Besides, what do modules have to do with this if we're talking about
UEFI?  Doesn't the kernel have to be loaded before modules are even an
issue?

Pardon me for being lost, just tyring to follow this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Load keys from signed PE binaries

2013-02-26 Thread Raymond Jennings
My two cents on this subject btw is that anything to do with
Microsoft's intentions or plans is an issue of policy that belongs
entirely in userspace.

mechanism, not policy

Besides, what do modules have to do with this if we're talking about
UEFI?  Doesn't the kernel have to be loaded before modules are even an
issue?

Pardon me for being lost, just tyring to follow this.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures

2013-02-15 Thread Raymond Jennings
On Fri, Feb 15, 2013 at 6:20 PM, Al Viro  wrote:
> Arrgh...  OK, I'm a blind idiot.  These places in binfmt_elf.c currently use
> force_sig(), not send_sig_info().  Currently == since 2006 when somebody
> noticed the problem.  Their counterparts in binfmt_elf_fdpic.c were *not*
> noticed.  Anyway, that definitely means we want to do it in a single commit;
> the only remaining question is whether we have any problems with somebody
> ptracing such execve() and then poking the sucker with ptrace();

Personally if I was ptracing another process, I'd be flummoxed if I
saw it get nailed with a fatal segfault that I somehow wasn't allowed
to intercept.

An even bigger question might be why an execve is allowed to get into
an unrecoverable state to begin with.  Assuming that one builds the
new mm_struct and whatnot BEFORE discarding old state, why would
execve be in a position for a fatal error in the first place?

> that _can_
> happen with the current mainline for ELF binaries, so this is not something
> new.  I'm low on coffee and about to crash, so I might be missing some
> horrible problem with it, but in this case I'm fairly sure that such a problem
> would be present in current mainline.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures

2013-02-15 Thread Raymond Jennings
On Fri, Feb 15, 2013 at 6:20 PM, Al Viro v...@zeniv.linux.org.uk wrote:
 Arrgh...  OK, I'm a blind idiot.  These places in binfmt_elf.c currently use
 force_sig(), not send_sig_info().  Currently == since 2006 when somebody
 noticed the problem.  Their counterparts in binfmt_elf_fdpic.c were *not*
 noticed.  Anyway, that definitely means we want to do it in a single commit;
 the only remaining question is whether we have any problems with somebody
 ptracing such execve() and then poking the sucker with ptrace();

Personally if I was ptracing another process, I'd be flummoxed if I
saw it get nailed with a fatal segfault that I somehow wasn't allowed
to intercept.

An even bigger question might be why an execve is allowed to get into
an unrecoverable state to begin with.  Assuming that one builds the
new mm_struct and whatnot BEFORE discarding old state, why would
execve be in a position for a fatal error in the first place?

 that _can_
 happen with the current mainline for ELF binaries, so this is not something
 new.  I'm low on coffee and about to crash, so I might be missing some
 horrible problem with it, but in this case I'm fairly sure that such a problem
 would be present in current mainline.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Drop support for x86-32

2012-08-24 Thread Raymond Jennings
Some useless troll said:
> nouveau is useless garbage as most open source graphics drivers.

Coming to an open source mailing list like LKML just to bitch about open
source being garbage?  Come on...at least entertain us with better
subtlety.

I'm ready to ignore this guy, how about everyone else?

*plonk*

Ah, much better.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Drop support for x86-32

2012-08-24 Thread Raymond Jennings
On Thu, 2012-08-23 at 12:41 +0200, wbrana wrote:
> Microsoft will drop support for x86-32 in Windows 9.
> Linux could do same.
> http://www.networkworld.com/community/blog/windows-9-details-are-already-emerging
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

I use an x86-32 system myself.

So do many other people.

Besides, it's not really your call to decide if x86-32 is obsolete.

If it's anyone's call, it's for companies like AMD and Intel that
actually make the chips.  Microsoft doesn't make x86 chips, so their
opinion on x86-32's viability is none of our concern.

Similiarly, if I were a marketing director for pepsi, I wouldn't listen
to anything that Coca cola has to say about what flavors of soda to
make.  A problem with the liquid CO2 company I buy my fizz from however
WOULD get my attention.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Drop support for x86-32

2012-08-24 Thread Raymond Jennings
On Thu, 2012-08-23 at 12:41 +0200, wbrana wrote:
 Microsoft will drop support for x86-32 in Windows 9.
 Linux could do same.
 http://www.networkworld.com/community/blog/windows-9-details-are-already-emerging
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

I use an x86-32 system myself.

So do many other people.

Besides, it's not really your call to decide if x86-32 is obsolete.

If it's anyone's call, it's for companies like AMD and Intel that
actually make the chips.  Microsoft doesn't make x86 chips, so their
opinion on x86-32's viability is none of our concern.

Similiarly, if I were a marketing director for pepsi, I wouldn't listen
to anything that Coca cola has to say about what flavors of soda to
make.  A problem with the liquid CO2 company I buy my fizz from however
WOULD get my attention.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Drop support for x86-32

2012-08-24 Thread Raymond Jennings
Some useless troll said:
 nouveau is useless garbage as most open source graphics drivers.

Coming to an open source mailing list like LKML just to bitch about open
source being garbage?  Come on...at least entertain us with better
subtlety.

I'm ready to ignore this guy, how about everyone else?

*plonk*

Ah, much better.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Introducing Lanyard Filesystem

2012-08-20 Thread Raymond Jennings
On Sun, 2012-08-19 at 20:47 -0400, Theodore Ts'o wrote:
> On Mon, Aug 20, 2012 at 01:06:20AM +0200, Carlos Alberto Lopez Perez wrote:
> > 
> > > I also seriously question the niche of people who want to use a thumb
> > > drive to transfer > 4GB files.  Try it sometime and see what a painful
> > > user experience it is
> > 
> > Think for example on consumer devices, for example on most moderns TV
> > you can plug a USB memory disk with videos and play them.
> 
> More and more consumer devices, including TV's, are network-enabled.
> I'm not at all convinced the USB memory disk model is the one which
> makes sense --- you can make a much better user experience work if you
> can rely on networking.  That way you don't have to move USB storage
> devices around, and USB storage devices are *slow* when the most
> common types are HDD's and crappy flash devices.  How many people are
> going to drop several hundred dollars for a USB-attached SSD, when
> using a networking transfer mechanism is much more convenient?
> 
> > And I doubt that the majority of this consumer devices are able to read
> > nothing more than FAT32 file-systems, so the 4GB limit is a big problem.
> > And here is where Microsoft is pushing their exFAT FS since it allows
> > working with 4GB+ files without the NTFS overhead.
> 
> We'll see how popular a heavily IP-encumbered file system will be,
> especially given that its main use case is for devices which are so
> constrained that they can't afford to use a "real file system" (like
> ntfs or ext4 or some other more sophisticated file system), but which
> nevertheless needs to be able to handle 4GB+ files.

My two cents:

After seeing microsoft's attack on TomTom over the vfat patents I
honesstly would consider it a good move to have an alternative free
format available.

> I'm sure there will be some use cases that might fit that niche, but
> it seems pretty tiny.  And this is completely ignoring what might
> happen if in the future people take 1gig fiber connections to the home
> (such as what many people in Kansas City will be enjoying very
> shortly) for granted
> 
> > As a side note, it would be possible to write a driver for exFAT and get
> > it merged upstream on the Linux Kernel without "breaking any law"?
> > Goggling I found an attempt to write such driver but seems that never
> > got merged:  https://lkml.org/lkml/2009/2/8/24
> 
> You'll need to talk to a lawyer about that, since that's fundamentally
> a legal question.
> 
> Regards,
> 
>   - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs: Introducing Lanyard Filesystem

2012-08-20 Thread Raymond Jennings
On Sun, 2012-08-19 at 20:47 -0400, Theodore Ts'o wrote:
 On Mon, Aug 20, 2012 at 01:06:20AM +0200, Carlos Alberto Lopez Perez wrote:
  
   I also seriously question the niche of people who want to use a thumb
   drive to transfer  4GB files.  Try it sometime and see what a painful
   user experience it is
  
  Think for example on consumer devices, for example on most moderns TV
  you can plug a USB memory disk with videos and play them.
 
 More and more consumer devices, including TV's, are network-enabled.
 I'm not at all convinced the USB memory disk model is the one which
 makes sense --- you can make a much better user experience work if you
 can rely on networking.  That way you don't have to move USB storage
 devices around, and USB storage devices are *slow* when the most
 common types are HDD's and crappy flash devices.  How many people are
 going to drop several hundred dollars for a USB-attached SSD, when
 using a networking transfer mechanism is much more convenient?
 
  And I doubt that the majority of this consumer devices are able to read
  nothing more than FAT32 file-systems, so the 4GB limit is a big problem.
  And here is where Microsoft is pushing their exFAT FS since it allows
  working with 4GB+ files without the NTFS overhead.
 
 We'll see how popular a heavily IP-encumbered file system will be,
 especially given that its main use case is for devices which are so
 constrained that they can't afford to use a real file system (like
 ntfs or ext4 or some other more sophisticated file system), but which
 nevertheless needs to be able to handle 4GB+ files.

My two cents:

After seeing microsoft's attack on TomTom over the vfat patents I
honesstly would consider it a good move to have an alternative free
format available.

 I'm sure there will be some use cases that might fit that niche, but
 it seems pretty tiny.  And this is completely ignoring what might
 happen if in the future people take 1gig fiber connections to the home
 (such as what many people in Kansas City will be enjoying very
 shortly) for granted
 
  As a side note, it would be possible to write a driver for exFAT and get
  it merged upstream on the Linux Kernel without breaking any law?
  Goggling I found an attempt to write such driver but seems that never
  got merged:  https://lkml.org/lkml/2009/2/8/24
 
 You'll need to talk to a lawyer about that, since that's fundamentally
 a legal question.
 
 Regards,
 
   - Ted
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/