Hello.

This year, I have been working on fixing various bugs reported by syzbot. There 
are
three problems which I am spending a lot of time; bugs in loop module, bugs 
under OOM
condition, making printk() messages readable.



The loop module, which syzbot is using as an infrastructure for testing various 
filesystems, has
bugs like "crash due to not being thread-safe" and "silently deadlock due to 
hiding from lockdep
inspection". A series of patches which should fix 2 bugs reported more than one 
year ago have
just arrived at linux-next.git
( 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/drivers/block/loop.c?h=next-20181109
 ),
and we are now monitoring whether these patches can fix other bugs where loop 
module might be
the culprit.



Regarding OOM, a deadly fight for fixing problems is in progress for many years.
Desperate lack of participants, and a very bad situation that people do not 
consider
the worst case / do not test patches at all. MM stands for Memory Management, 
but
it is far from Management regarding Out Of Memory behavior. Since patches are 
merged
without reviewing their correctness, I'm developing reproducers one by one and 
fixing
bugs of bug-fix patches. I finally got to merge a patch into 4.20-rc1 which 
fixes a
problem that the system silently hangups because workqueue does not sleep upon 
OOM
( 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/page_alloc.c?h=v4.20-rc1&id=15f570bf3d13aa94a97234538a5110d18df03aa3
 ).

As a different problem, since MMF_OOM_SKIP flag is set too quickly, a problem 
that the OOM killer
needlessly kills more processes is remaining. To mitigate this problem, an 
approach which hands
over setting MMF_OOM_SKIP flag to exiting task was proposed. But since nobody 
is participating,
that approach is stalling because the correctness of the patch cannot be 
proven. Instead, I
proposed a timeout based approach which is possible to prove the correctness
( 
https://lkml.kernel.org/r/1540033021-3258-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp
 ),
and the collision between these two approaches remains.

Then, yet different problems like "lockup caused by in-kernel memory leak" and 
"flooding
bug caused by memcg OOM handling" are discovered because syzbot started testing 
unusual
cases. The former is, the system can lockup when printk() is massively called 
for reporting
Out Of Memory situation because printk() is a slow operation. The latter is, 
console becomes
unusable when memcg OOM killer was not able to find a process to terminate 
because printk()
is called forever for reporting that there is no killable process. Regarding 
this problem,
a collision regarding how to reduce the frequency of calling printk() remains.



printk() is a kernel function which corresponds to printf() for userspace 
programs.
printk() works fine if one line of message (a text string which ends with '\n') 
is
printed by one printk() call. But when multiple printk() calls are used for 
printing
one line of message, by concurrently calling printk() from multiple threads, it
becomes difficult to parse the printed messages because multiple messages are 
get
mixed or '\n' is emitted more than needed.

Since fuzzing test attempts unusual behavior repeatedly and/or intentionally 
passes
unusual arguments, a lot of messages are printed. And it is important that we 
can
pick up messages related to unexpected results so that we can figure out that a
problem occurred.

Regarding userspace program, a global variable "stdout" is shared by only that 
process.
But in kernel space, all threads on the system share a "stdout"-equivalent 
global
variable. Therefore, in order to prevent messages from being mixed, we need to 
pass
a "FILE *fp"-equivalent variable to all functions which might call printk().
https://lkml.kernel.org/r/3786fdc3-49a5-281f-74cd-c7f37fb06...@i-love.sakura.ne.jp
is an example attempt doing it. (Or, we need to do "snprintf()"-equivalent 
processing
before calling printk() so that one line of message is printed by one printk() 
call.)
Since the kernel is a huge program, you can easily imagine how difficult it is 
to
replace printf() with fprintf(fp) for the tree wide.

Since the merit is small despite huge modification, I think that it is unlikely 
that
printk() users update their code to use a new API even if printk() subsystem 
offered
such API. Therefore, I have just proposed a different approach
( 
https://lkml.kernel.org/r/07dcbcb8-c5a7-8188-b641-c110ade1c...@i-love.sakura.ne.jp
 )
that behaves as if there are multiple "stdout"-equivalent variables by 
distinguishing
printk() callers from printk().



Well, where will these discussions arrive at? :-)

_______________________________________________
tomoyo-users-en mailing list
tomoyo-users-en@lists.osdn.me
https://lists.osdn.me/mailman/listinfo/tomoyo-users-en

Reply via email to