Re: Overwriting copy functionality in filesystem

2019-03-24 Thread Valdis Klētnieks
On Sun, 24 Mar 2019 18:48:08 +0530, Bharath Vedartham said:

> I was interested in implementing copy-on-write for my filesystem(for fun
> :P). When I do a "cp" operation, I do not want to create a seperate
> inode for the new file. I only want to create a inode when I make a
> change to the file.

Actually, /bin/cp isn't where copy-on-write gets you benefits. Where it really
shines is when you have a versioning filesystem that keeps track of the last
N versions of a file with minimum overhead. So if you have a 100 megabyte
file, open it, write 5 blocks of data, and close it, you now can read back
either the original or new versions of the file, and you're only using 100M plus
5 blocks plus a tiny bit of metadata.

> There is no vfs api for cp. I would need to make creat syscall aware of the
> fact that it is being executed by "cp". My immediate idea was to check
> if a file with the same data exists in the filesystem but that would be
> way too much overhead.

Have you looked at other filesystems that already support copy-on-write?

Hint:  How do file systems that support point-in-time snapshots work?

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: What will happen if 2 processes map same physical page

2019-03-22 Thread Valdis Klētnieks
On Fri, 22 Mar 2019 12:15:49 +0300, Lev Olshvang said:

> But the question might be rephrased :  IMHO Kernel should mandate same PTE
> flags no matter how many virtual mapping were made to the same physical page.

And exactly *why* should it be "mandated"?  Certainly, for many classes of 
objects,
such as shared libraries, it's a desirable feature (maybe - but see below).

However, there's plenty of *other* use cases where the programmer may want to
have one control process having read/write access to a memory segment, while
a bunch of worker processes are merely reading the data.

For instance, if you're serving out complicated computations to sub-processes 
that
involve a lot of parameters and input data, the control process already *has* 
all this
data (potentially megabytes of it) in memory. Using shared memory to transfer 
it to
the worker process is a lot more efficient than having to stuff it all through 
a socket.

And even for shared libraries, you may want one process to be able to write to 
the
space while others are reading it, for live patching and similar functions.  
(Yes, there's
a security trade-off there - and yes, there are sites that will accept the 
risk, and no,
that sort of trade-off belongs in userspace, not in the kernel).

The kernel does mechanism, not policy.  So it's totally reasonable to have a
defined way for userspace to say "this page can only be shared with these
permissions" - that's mechanism. Having the kernel force a specific value
without a good architectural reason is policy.

(Sometimes the kernel does force things to work a specific way if it's required
to guarantee system stability.  That's why you can't use the  write() system
call on a directory even if you have write permissions - you can only use stuff
like link() and open(). Permissions on shared memory pages don't involve that
sort of kernel self-defense issue.


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to calculate page address to PFN in user space.

2019-04-05 Thread Valdis Klētnieks
On Fri, 05 Apr 2019 10:54:47 -, Pankaj Suryawanshi said:

> I have PFN of all processes in user space, how to calculate page address to 
> PFN.

*All* user processes?  That's going to be a lot of PFN's.  What problem are you 
trying
to solve here?

(Hint - under what cases does the kernel care about the PFN of *any* user page?)

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Unexpected scheduling with mutexes

2019-03-29 Thread Valdis Klētnieks
On Fri, 29 Mar 2019 21:01:58 +0100, Greg KH said:

> But if you are trying to somehow create a real api that you have to
> enforce the passing off of writing data from two different character
> devices in an interleaved format, you are doing this totally wrong, as
> this is not going to work with a simple mutex, as you have found out.

There's almost always an even more fundamental issue here - I've seen plenty of
people attempt to do this sort of thing.  But invariably, they have little to
no explanation of what semantics they think are correct. I'm not sure who are
crazier - the people who try to do kernel-side locking for "exclusive" use of a
device, or the people who don't understand why having 3 different programs
trying to talk to /dev/ttyS0 at once will only lead to tearns and anguish...

(Though recently, I discovered that there are no bad ideas so obvious that
somebody won't try to re-invent them.  I caught a software package that *really*
should know better using "does DBus have an entry for this object?" as a lock.)

> Try to take USB out of the picture as well as userspace, and try running
> two kernel threads trying to grab a mutex and then print out "A" or "B"
> to the kernel log and then give it up.  Is that output nicely
> interleaved or is there some duplicated messages.[1]

> [1] Extra bonus points for those that recognize this task...

Been there, done that, got the tire marks to prove it. :)

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Overwriting copy functionality in filesystem

2019-03-28 Thread Valdis Klētnieks
On Fri, 29 Mar 2019 00:00:17 +0530, Bharath Vedartham said:

> I was thinking of a use case where we are copying a huge file (say 100
> GB), if we do copy-on-write we can speed up /bin/cp for such files i
> feel. Any comments on this?

Hmm.. wait a minute.  What definition of "copy on write" are you using?

Hint - if you're copying an *entire* 100GB file, the *fastest* way is to simply
make a second hard link to the file. If you're determined to make an entire
second copy, you're going to be reading 100GB and writing 100GB, and the
exact details aren't going to matter all that much.

Now, where you can get clever is if you create your 100GB file, and then
somebody only changes 8K of the file.  There's no need to copy all 100GB into a
new file if you are able to record "oh, and this 8K got changed". You only need
to write the 8K of changes, and some metadata.

(Similar tricks are used for shared libraries and pre-zero'ed storage.  
Everybody
gets a reference to the same copy of the page(s) in memory - until somebody
scribbles on a page.

So say you have a 30MB shared object in memory, with 5 users.  That's 5 
references
to the same data.  Now one user writes to it.  The system catches that write 
(usually
via a page fault), copies just the one page to a new page, and then lets the 
write to the new
page complete.  Now we have 5 users that all have references to the same 
(30M-4K)
of data, 4 users that have a reference to the old copy of that 4K, and one user 
that
has a reference to the modified copy of that 4K.

https://en.wikipedia.org/wiki/Copy-on-write

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: maintainer for this update?

2019-03-28 Thread Valdis Klētnieks
On Thu, 28 Mar 2019 14:53:28 -0700, Dave Stevens said:
> In the article below about kernel updates is a reference to a driver for
> a Plantower PMS7003 particulate matter sensor. I'm using this series in
> air pollution monitoring and its usual interface to the computer
> is over a serial line. I consequently am very curious about how and why
> it would need a specific kernel module.

A quick look at the driver shows that it apparently does at least
a little bit of the heavy lifting for you - rather than you have to do all the
serial I/O interpretation yourself in userspace, it will handle checksums and
channels and so on for you.

+enum pms7003_cmd {
+   CMD_WAKEUP,
+   CMD_ENTER_PASSIVE_MODE,
+   CMD_READ_PASSIVE,
+   CMD_SLEEP,
+};
+
+/*
+ * commands have following format:
+ *
+ * +--+--+-+--+-+---+---+
+ * | 0x42 | 0x4d | cmd | 0x00 | arg | cksum msb | cksum lsb |
+ * +--+--+-+--+-+---+---+
+ */
+static const u8 pms7003_cmd_tbl[][PMS7003_CMD_LENGTH] = {
+   [CMD_WAKEUP] = { 0x42, 0x4d, 0xe4, 0x00, 0x01, 0x01, 0x74 },
+   [CMD_ENTER_PASSIVE_MODE] = { 0x42, 0x4d, 0xe1, 0x00, 0x00, 0x01, 0x70 },
+   [CMD_READ_PASSIVE] = { 0x42, 0x4d, 0xe2, 0x00, 0x00, 0x01, 0x71 },
+   [CMD_SLEEP] = { 0x42, 0x4d, 0xe4, 0x00, 0x00, 0x01, 0x73 },
+};

So you can send 'wakeup' or 'read' and it will send the strings, and
then read back data and complete the I/O when enough bytes have
arrived.

There's also some probing in there to deal with device tree etc.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Where is PageHead defined in v5.0?

2019-03-27 Thread Valdis Klētnieks
On Wed, 27 Mar 2019 17:23:05 -0700, Igor Pylypiv said:
> and TESTPAGEFLAG defines PageHead:
> #define TESTPAGEFLAG(uname, lname, policy) \
> static __always_inline int Page##uname(struct page *page)
> 
> (https://elixir.bootlin.com/linux/v5.0.5/source/include/linux/page-flags.h#L215)

General tip:  If you're trying to find where the kernel defines FooBar,
and 'git grep FooBar' only finds uses and no definitions, it probably
means somebody got over-exuberant with the ## pre-processor
operator

'git grep Foo##' usually reveals the culprit. ;)

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Questions about my idea (for gsoc)

2019-04-01 Thread Valdis Klētnieks
On Mon, 01 Apr 2019 18:19:12 +0200, Andrea Laisa said:

> My idea is to write a Linux driver for a proprietary network protocol 
> used by the outdoor Gemtek antennas(which are essentially a LTE modem) 

This may be a legal mine field.  To create a *usable* Linux driver, you'll need
to have one that's free of licensing restrictions.  If Gemtek claims one or
more patents on the protocol, you may be unable to implement the protocol
without dealing with the licensing issues.

And I'm reasonably sure that Gemtek won't be too interested in helping you
write the driver, because its existence would cut into their sales of access 
points.

Note that this is *different* than the question of reverse engineering hardware
or software for interoperability. Patents cover any use whatsoever of the 
patented
intellectual property, and if you reverse engineer and then implement, you
can't claim accidental infringement (which happens a lot, especially if the
patent ofice is lax on enforcing the "non-obvious" requirement) and you're
squarely in willful infringement territory, which in the US can get you hit
with triple damages...

(Note - I am not a laywer, just somebody who reads enough about intellectual
property law to know that anytime you see "proprietary", you are *guaranteed*
to hit some sort of intellectual property law issue. 

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Overwriting copy functionality in filesystem

2019-03-23 Thread Valdis Klētnieks
On Sat, 23 Mar 2019 22:29:45 +0530, Bharath Vedartham said:

> I was wondering how we can overwrite the copy functionality while
> writing our own filesystem in linux. 
> VFS does not offer any sort of API for copy. I think it calls create and
> write when we execute the copy the file/dir.

Which you can verify using strace.  Which you should already be familiar with
if you have the experience needed to write a usable filesystem.

> I am interested in overwriting the way copy happens in my
> filesystem(which I am writing for fun :p). 

And what, exactly, do you want copy to do differently on the API level,
and on the file system level?


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: What will happen if 2 processes map same physical page

2019-03-20 Thread Valdis Klētnieks
On Wed, 20 Mar 2019 16:42:39 +0300, Lev Olshvang said:
> The question is it ipossiblle in Linux/MMU/TLB  that 2 processes map to
> the same physical address?

Totally possible.  That's how mmap shared memory works, and why shared
libraries are possible.

> Will CPU or  TLB discover that second process tries to reach occupied 
> physical page?

Well, the hardware won't discover it as a "second" process, it only knows it's
processing *this* memory access.

> What if first process set page permission to read and second whats to write 
> to this page ?

Perfectly OK - the two processes have separate page table mappings, with
separate permission bits. So (for example) physical page 0x17F000 is mapped to
virtual address 0x2034D000 with read-only permission n process 1's page tables,
and to virtual address 0x98FF3000 with read-write permission in process 2's
page tables. No problem.

(And before you ask, yes it's possible for process 2 to running on one core
doing a write to the page at the exact same time that process 1 is doing a read
on another core.  Depending on the hardware cache design, this may or may not
get process 1 updated data.  This is why locking and memory barriers are
important. See Documentation/memory-barriers.txt for more details)

"And then there's the Alpha" - a processor design that got much of its speed by
being weird about this stuff. :)

> Perhaps during context switch all page access permissions of first process is
> flashed out from MMU ?

Actually, the kernel just points the MMU at a new set of page table entries and 
lets
the TLB reload as needed. In particular, on most architectures, the kernel 
tries really
hard to ensure that all processes share at least part of their page table 
mappings so
the kernel is always mapped at the same place, meaning that there's a better 
chance
that on a syscall, the TLB already has hot entries for large parts of the 
kernel so no
TLB reloads are needed.


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: fsync slowness + XFS -- Regd.

2019-03-19 Thread Valdis Klētnieks
On Tue, 19 Mar 2019 07:40:29 +0530, Jeno P said:

> One of my module *(C/C++)* is writing to log files and periodically flushes
> it using *fsync()*. Even for small amount of data, *fsync* is taking more
> time *(>15 seconds)* than expected.

Is it possible that some other process is applying a file lock to the log file?
Any signs of I/O errors? 15 seconds is the sort of delay you might see on a
RAID controller with one dead or partially dead disk throwing timeouts...


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: development in python

2019-05-12 Thread Valdis Klētnieks
On Sun, 12 May 2019 19:05:29 +0200, Tom�š Roj said:

> I know that everything in the kernel is written in C but are there also
> some fields where can I code in Python? I mean some websites etc.

Not a whole lot of Python being used for building websites.  Now if you meant
websites that have projects written in Python, I'd suggest giving GitHub
and SourceForge a poke, I'm sure you'll find plenty of Python based projects.


pgp6QaUM7JiJD.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: how to collect information regarding function calls in run time?

2019-05-17 Thread Valdis Klētnieks
On Tue, 14 May 2019 16:11:51 -0300, Pedro Terra Delboni said:

> I agree that the question alone seems like a weird one, I just assumed
> when I wrote my first email that the explaining the motivation would
> only consume time of the reader.

Asking "what problem are you trying to solve" is a standard question, because
whenever a programmer is saying "I can't get X to do Y", a good 85% of the time
it turns out that  isn't working because using W to do Z is the
already-existing API for what they actually wanted to do

> The subject I'm working on is Control-Flow Integrity, which instrument
> a code so that each indirect jump (which are usually returns or
> indirect calls) verify if the address they are returning is a valid
> one (so there is a code stub that runs in every function call and
> return).

> The reason I want to count call instructions execution is because the
> function return tied to the most executed call instruction will be the
> one that will cause the greater increase in execution time, so by
> inlining that call we'll be exchanging this cost for the cache impact
> of the code expansion (as the code stub won't exist anymore for this
> call).

I suspect that the vast majority of functions that are *that* heavily used are
either (a) already inlined or (b) too large to inline - for instance, kmalloc
is used heavily, but having separate inlined copies everyplace to avoid the
return statement is going to bloat the code - and even worse, make almost all
the inline copies cache-cold instead of one shared cache-hot chunk of 2K.

And the question we *should* be asking is *not* "is the return address a 
plausible
one".  It's "is the return address *the one we were called from*".  Checking
whether kmalloc is about to return to a valid call point doesn't tell you much.
Finding out that kmalloc is about to return to one of the 193,358 *other* call
points rather than the one it was actually called from is something big.





pgpbqTZOZlmK3.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: GPIO Driver for Skylake-Y PCH

2019-06-14 Thread Valdis Klētnieks
On Fri, 14 Jun 2019 12:01:28 -0700, you said:

> > static const struct pci_device_id pch_gpio_pcidev_id[] = {
> >  { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x8803) },
> >  { PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8014) },
> >  { PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8043) },
> >  { PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8803) },
> >  { 0, }
> > };
> > MODULE_DEVICE_TABLE(pci, pch_gpio_pcidev_id);

> It is a PCI device with 8086/9d20 IDs.

Give this patch a try, if it works I'll push it upstream for you...

diff --git a/drivers/gpio/gpio-pch.c b/drivers/gpio/gpio-pch.c
index 1d99293096f2..19884b5b2a74 100644
--- a/drivers/gpio/gpio-pch.c
+++ b/drivers/gpio/gpio-pch.c
@@ -439,6 +439,7 @@ static SIMPLE_DEV_PM_OPS(pch_gpio_pm_ops, pch_gpio_suspend, 
pch_gpio_resume);

 static const struct pci_device_id pch_gpio_pcidev_id[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x8803) },
+   { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x9d20) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8014) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8043) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8803) },



pgpiPuqrvf4fp.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: GPIO Driver for Skylake-Y PCH

2019-06-14 Thread Valdis Klētnieks
On Fri, 14 Jun 2019 10:58:53 -0700, "Alexander Ivanov" said:

> I have a hardware platform with Skylake i7-6500 CPU and Skylake-Y PCH
> southbridge, running 4.8.5 kernel fc25. The platform has 12 GPIO pins, 
> however,
> none are available. gpio-pch driver does not support D31:F2 device that 
> manages
> GPIO.

> Am I missing something here?

Well.. my copy of drivers/gpio/gpio-pch.c has this near line 440:

static const struct pci_device_id pch_gpio_pcidev_id[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x8803) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8014) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8043) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8803) },
{ 0, }
};
MODULE_DEVICE_TABLE(pci, pch_gpio_pcidev_id);

Though I'm having a hard time aligning that with "D31:F2". Are you confusing
a PCI address with a PCI ID, or is this on a non-PCI bus?


pgpRdsZRQpEMc.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: GPIO Driver for Skylake-Y PCH

2019-06-15 Thread Valdis Klētnieks
On Fri, 14 Jun 2019 15:40:59 -0700, "Alexander Ivanov" said:

(Adding likely knowledgeable  people to the recipients)

Jean,  Andy, Linus: The situation thus far:  Alexander has a system with this 
GPIO on it:

> lspci -vvvnns 1f.1
> 00:1f.1 Memory controller [0580]: Intel Corporation Device [8086:9d20] (rev 
> 21)
>  Subsystem: Gigabyte Technology Co., Ltd Device [1458:1000]
>  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx-
>  Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-  SERR-   Latency: 0
>  Region 0: Memory at 7d00 (64-bit, non-prefetchable) [size=16M]

The obvious first thing to try was:

diff --git a/drivers/gpio/gpio-pch.c b/drivers/gpio/gpio-pch.c
index 1d99293096f2..19884b5b2a74 100644
--- a/drivers/gpio/gpio-pch.c
+++ b/drivers/gpio/gpio-pch.c
@@ -439,6 +439,7 @@ static SIMPLE_DEV_PM_OPS(pch_gpio_pm_ops, pch_gpio_suspend, 
pch_gpio_resume);

 static const struct pci_device_id pch_gpio_pcidev_id[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x8803) },
+   { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x9d20) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8014) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8043) },
{ PCI_DEVICE(PCI_VENDOR_ID_ROHM, 0x8803) },

and that died thusly when attempted to load:

[ 105.965846] pci :00:1f.1: [8086:9d20] type 00 class 0x058000
[ 105.965928] pci :00:1f.1: reg 0x10: [mem 0xfd00-0xfdff 64bit]
[ 105.967084] pci :00:1f.1: BAR 0: assigned [mem 0x7d00-0x7dff 
64bit]
[ 105.978037] pch_gpio :00:1f.1: pch_gpio_probe : pci_iomap FAILED
[ 105.978194] pch_gpio :00:1f.1: pch_gpio_probe Failed returns -12
[ 105.978317] pch_gpio: probe of :00:1f.1 failed with error -12

So obviously an older kernel.  Not sure what release Alexander is on, but
a 'git log' against this week's linux-next tree didn't show anything that was
an obvious fix for a similar problem.

Any ideas?



pgpcUyfcoyzml.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: GPIO Driver for Skylake-Y PCH

2019-06-18 Thread Valdis Klētnieks
On Tue, 18 Jun 2019 11:40:34 +0300, Andy Shevchenko said:

> Yes. Most of the SoCs from Intel use GPIO IP based on Chassis specification,
> the drivers for which are available under drivers/pinctrl/intel. What you are
> looking for is located under PINCTRL_SUNRISEPOINT configuration option.

Thanks for the info, it's often unclear where to look - when the hardware has
a PCH and documentation that says it has GPIO, and there's an in-tree driver
called gpio_pch, it's easy to fail to look in the right place :)


pgpSaQjkf_0jX.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: [IMX] [DRM]: suspend/resume support

2019-06-19 Thread Valdis Klētnieks
On Wed, 19 Jun 2019 20:47:34 +0530, Pintu Agarwal said:

> No I mean to say, there are lots of features and customization already
> done on this version and stabilized.
> Upgrading again may require months of effort.

This is what happens when you don't upstream your local changes.

And no, saying "But we're a small company and nobody cares" isn't an
excuse - Linux carried the entire Voyager architecture around for several years
for 2 machines. Not two models, 2 physical machines, the last 2 operational
systems of the product line.

(Not the Xubuntu-based Voyage distribution either - the Voyager was a mid-80s
SMP fault-tolerant system from NCR with up to 32 486/586 cores and 4G of
memory, which was a honking big system for the day...)

https://kernel.googlesource.com/pub/scm/linux/kernel/git/rzhang/linux/+/v2.6.20-rc1/Documentation/voyager.txt

The architecture was finally dropped in 2009 when enough hardware failures
had happened that James Bottomley was unable to create a bootable
system from the parts from both...

So if your production run is several thousand systems, that's *plenty* big
enough for patches and drivers (especially since drivers for hardware you
included in your several-thousand system run are also likely applicable to
a half dozen other vendors who made several thousand systems using the
same chipset


pgpc6lnHKvaux.pgp
Description: PGP signature


Re: GPIO Driver for Skylake-Y PCH

2019-06-15 Thread Valdis Klētnieks
On Sat, 15 Jun 2019 12:38:34 -0700, "Alexander Ivanov" said:

> This is fedora 25 running 4.8.6 kernel.

It probably won't fix the problem, but you should upgrade if at all possible.
You're not getting any security patches for 25.  30 is the current release,
with 31 due out fairly soon.



pgpnJDhzx8rIc.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: how to determine whether the source code is same between two kernels

2019-05-11 Thread Valdis Klētnieks
On Sat, 11 May 2019 09:20:22 -0400, Aruna Hewapathirane said:

> Seriously ? Since when are you working for turing-police ?

I'm semi-retired. :)

And there's two other systems in the room called wintermute and neuromancer.
Hopefully you can figure it out from there. :)


pgpCqPU2a9tRj.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: how to determine whether the source code is same between two kernels

2019-05-10 Thread Valdis Klētnieks
On Fri, 10 May 2019 22:11:31 -0400, Aruna Hewapathirane said:

> > Suppose I have two kernels, one is A.B.C build by people Tom. And
> > the other is A.B.C build by Jerry. The source code have been deleted

> Run diff vmlinuz-Tom vmlinuz-Jerry and see if they differ. Then just  to

Don't even bother.  If Tom and Jerry both did builds, the binaries *will* 
differ, because...

% dmesg | grep 'Linux vers'
[0.00] Linux version 5.1.0-rc5-next-20190416-dirty 
(source@turing-police) (gcc version 9.0.1 20190328 (Red Hat 9.0.1-0.12) (GCC)) 
#664 SMP PREEMPT Wed Apr 17 12:31:51 EDT 2019

There's a datestamp, a build number, and a compiler version in there.

Also, since vmlinuz is a binary file, /bin/cmp is a better choice than diff.


pgpWwBU_H9GQe.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


[PATCH] scripts/spdxcheck.py - fix list of directories to check

2019-05-12 Thread Valdis Klētnieks
After this commit:

commit 62be257e986dab439537b3e1c19ef746a13e1860
Author: Christoph Hellwig 
Date:   Tue Apr 30 06:51:30 2019 -0400

LICENSES: Rename other to deprecated

checkpatch throws an error:

[/usr/src/linux-next]2 scripts/checkpatch.pl -f 
drivers/staging/rtl8712/rtl871x_rf.h
FAIL: "Blob or Tree named 'other' not found"
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 240, in 
spdx = read_spdxdata(repo)
  File "scripts/spdxcheck.py", line 41, in read_spdxdata
for el in lictree[d].traverse():
  File "/usr/lib/python2.7/site-packages/git/objects/tree.py", line 298, in 
__getitem__
return self.join(item)
  File "/usr/lib/python2.7/site-packages/git/objects/tree.py", line 244, in join
raise KeyError(msg % file)
KeyError: "Blob or Tree named 'other' not found"

Fix directory search list. Pick up the new LICENSES/dual while we're there...

Reported-by: Deepak Mishra 
Signed-off-by: Valdis Kletnieks 

diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
index 4fe392e507fb..7abd5f5cb14d 100755
--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -32,7 +32,7 @@ import os
 def read_spdxdata(repo):
 
 # The subdirectories of LICENSES in the kernel source
-license_dirs = [ "preferred", "other", "exceptions" ]
+license_dirs = [ "preferred", "dual", "deprecated", "exceptions" ]
 lictree = repo.head.commit.tree['LICENSES']
 
 spdx = SPDXdata()




Re: Checkpatch.pl FAIL: "Blob or Tree named 'other' not found"

2019-05-12 Thread Valdis Klētnieks
On Sun, 12 May 2019 11:45:24 +0530, Deepak Mishra said:

> When I run checkpatch.pl, for every file I get the follwong or similar
> error. Could you please help if this is my environment issue or actually
> error in code which I need to fix ?
>
> I executed the following in command prompt.
> perl scripts/checkpatch.pl -f drivers/staging/rtl8712/* |less
>
> The error I observe
> drivers/staging/rtl8712/rtl871x_rf.h

Found the problem, patch submitted.  It's a one-line fix:

diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
index 4fe392e507fb..7abd5f5cb14d 100755
--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -32,7 +32,7 @@ import os
 def read_spdxdata(repo):
 
 # The subdirectories of LICENSES in the kernel source
-license_dirs = [ "preferred", "other", "exceptions" ]
+license_dirs = [ "preferred", "dual", "deprecated", "exceptions" ]
 lictree = repo.head.commit.tree['LICENSES']





___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: how to collect information regarding function calls in run time?

2019-05-14 Thread Valdis Klētnieks
On Tue, 14 May 2019 10:55:40 -0300, Pedro Terra Delboni said:

> Regarding bpftrace: This seemed like the best option since I could use it
> to count frames of the stack with depth 2, allowing me to know precisely
> the amount of times each specific call has been made. However, I could not
> use it because since I have to probe every function, it would raise an
> error related to open file limit. I've tried setting the open file limit to
> unlimited, but the command I used to do so said it was impossible, also the
> current limit is set to 1048576, so I'm guessing that probing every
> function isn't a viable solution.

What problem are you trying to solve?

If you're trying to count how often *every* function is called, and the fact
that one way to do it has an upper limit of a million is a problem, chances are
that you haven't figured out what the *question* is yet.

Usually, the number of calls isn't that important, the total runtime spent in
the function is important.  A one-liner inline accessor function that compiles
down to 2-3 machine opcodes can be called tens of thousands of times a second
and not be noticed.  A function that takes milliseconds to complete will be
noticed if it's called only a few dozen times a second.

If you're trying to figure out how the functions fit together, a static call
graph analysis tool to produce a map of what calls what may be what you need.

Having said that, a kernel built with gcov or ftrace support will give you the
info you need.

See kernel/gcove/Kconfig and 
http://heim.ifi.uio.no/~knuto/kernel/4.14/dev-tools/gcov.html
if you want to go that route.

Resources for ftrace call counts:

http://www.brendangregg.com/blog/2014-07-13/linux-ftrace-function-counting.html

https://wiki.linaro.org/KenWerner/Sandbox/ftrace and see section 'function 
profiler'.

Be prepared for your kernel to be quite slow, and have to do a *lot* of data
reduction.

Note that you'll probably need to run for at least several hours, and of course
the function counts will be *very* dependent on what you do - what gets called
while I'm doing stuff like writing e-mail is very different from what happens
during a kernel compile, and both of those are different from the function
counts that happen when I back up my laptop to an external USB disk.

(Note I've not *tried* any of the above - this laptop is slow enough as it is :)


pgppA2e642Qu3.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Page Allocation Failure and Page allocation stalls

2019-05-01 Thread Valdis Klētnieks
On Thu, 02 May 2019 04:56:05 +0530, Pankaj Suryawanshi said:

> Please help me to decode the error messages and reason for this errors.

> [ 3205.818891] HwBinder:1894_6: page allocation failure: order:7, 
> mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)

Order 7 - so it wants 2**7 contiguous pages.  128 4K pages.

> [ 3205.967748] [<802186cc>] (__alloc_from_contiguous) from [<80218854>] 
> (cma_allocator_alloc+0x44/0x4c)

And that 3205.nnn tells me the system has been running for almost an hour. Going
to be hard finding that much contiguous free memory.

Usually CMA is called right at boot to avoid this problem - why is this
triggering so late?

> [ �671.925663] kworker/u8:13: page allocation stalls for 10090ms, order:1, 
> mode:0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null)

That's a *really* long stall.

> [ �672.031702] [<8021e800>] (copy_process.part.5) from [<802203b0>] 
> (_do_fork+0xd0/0x464)
> [ �672.039617] �r10: r9: r8:9d008400 r7: r6:81216588 
> r5:9b62f840
> [ �672.047441] �r4:00808111
> [ �672.049972] [<802202e0>] (_do_fork) from [<802207a4>] 
> (kernel_thread+0x38/0x40)
> [ �672.057281] �r10: r9:81422554 r8:9d008400 r7: r6:9d004500 
> r5:9b62f840
> [ �672.065105] �r4:81216588
> [ �672.067642] [<8022076c>] (kernel_thread) from [<802399b4>] 
> (call_usermodehelper_exec_work+0x44/0xe0)

First possibility that comes to mind is that a usermodehelper got launched, and
it then tried to fork with a very large active process image.  Do we have any
clues what was going on?  Did a device get hotplugged?


pgpA8NHLr4tZM.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: radix_tree_next_chunk: redundant search for next slot in hole

2019-05-03 Thread Valdis Klētnieks
On Fri, 03 May 2019 16:03:46 -0500, Probir Roy said:

> While searching for next slot in a hole, it walks through the same
> slots over n over.

How did you determine this?

>   while (++offset < RADIX_TREE_MAP_SIZE) {
> void *slot = rcu_dereference_raw(   /* redundant slot 
> walk */
> node->slots[offset]);
> if (slot)
> break;
> }

Looks to me like the ++offset will walk through each potential slot once,
and break out if it finds one.

I haven't looked at the code closely, perhaps what you're seeing is repeated
scan/merge/rescan behavior?  Often, compacting a data structure requires
multiple passes.


pgp5JCoSQqIrL.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: radix_tree_next_chunk: redundant search for next slot in hole

2019-05-03 Thread Valdis Klētnieks
On Fri, 03 May 2019 19:00:26 -0500, Probir Roy said:
> > > While searching for next slot in a hole, it walks through the same
> > > slots over n over.
> >
> > How did you determine this?
>
> I am working on a tool that identifies repeated load of an address.
> Often these repeated loads are redundant and can be avoided with data
> structure modification. The tool points me to this line.

Is this doing static analysis, or actually doing run-time tracing?

> > Looks to me like the ++offset will walk through each potential slot once,
> > and break out if it finds one.
>
>
> This function is being called by the radix_tree_for_each_slot
> iterator, defined as follows:
>
> #define radix_tree_for_each_slot(slot, root, iter, start)   \
> for (slot = radix_tree_iter_init(iter, start) ; \
>  slot || (slot = radix_tree_next_chunk(root, iter, 0)) ;\
> //   ---^^^
>  slot = radix_tree_next_slot(slot, iter, 0))
>
> Here is the calling context I get:
> |_ depth: 1 :0, method: ext4_block_write_begin+0x335/0x4f0(),
>   |_ depth: 2 :0, method: alloc_buffer_head+0x21/0x60(),
>|_ depth: 3 :0, method: ext4_da_get_block_prep+0x1a6/0x490(),
> |_ depth: 4 :0, method: clean_bdev_aliases+0x9a/0x210(),
>  |_ depth: 5 :0, method: pagevec_lookup_range+0x24/0x30(),
>   |_ depth: 6 :0, method: find_get_pages_range+0x151/0x2d0(),
>|_ depth: 7 :0, method: radix_tree_next_chunk+0x10f/0x360()
>
> Does it explain the case?

Actually, that calling context doesn't tell us much of anything till depth 7.

Yes, next_chunk() and next_slot() can get called repeatedly, especially if it's
a large radix tree. The important question is: Is it being called with the
*same value* of 'slot' repeatedly? Looking at the code, it's pretty obvious
that 'slot' will be updated at least once through every pass through the
for_each_slot(), unless the radix tree is corrupted.

If you're trying to do static analysis, your code may be confused by either the
'slot || next_chunk()' iterator, or the fact that 'slot' is assigned both in 
the for loops
iterator and in the body of the loop, and thus failing to detect that slot is 
updated.




pgpN85ErUhpNA.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: ULIMIT: limiting user virtual address space (stack inluded)

2019-05-02 Thread Valdis Klētnieks
On Thu, 02 May 2019 09:06:09 -0400, Karaoui mohamed lamine said:

> According to the man page of "ulimit", it is possible to limit the virtual
> address space of a user process: https://ss64.com/bash/ulimit.html
> To limit the virtual address space to 2 GB:
> $ ulimit -SH -v 1048576

It's possible.  It also probably doesn't do what you think it does. In 
particular,
you may be looking for the -d (data) and/or -m (memory) flags.

Also, to limit it to 2G, you're going to need 'ulimit -v 2097152'

> However, it seems that "stack" allocation ignores this limit (I tried with
> both kernel 4.4 and 4.15).

I'm going to go out on a limb and say that what you think is happening
is not in fact what is happening.

> I found this article (patch) that seems to fix the problem:
> https://lwn.net/Articles/373701/

It's unclear what your actual problem is, but given that the patch
has been in the kernel for a decade, it's almost certainly not related.

> My questions are:
> - Why is the stack allocation not placed within the "ulimit" limit? (is it
> a bug?)

Rule of thumb:  Before asking why something happens, verify that it does
in fact happen.

Consider the following C program:

[~] cat > moby-stack.c
#include 

#define MOBYSTACK 16384
#define BIG 2048

int recurse(int level) {
double a[BIG]; /* chew up stack space */

if (level == MOBYSTACK) sleep(999);
recurse(level+1);
}

int main() {recurse(0); };
^D
[~] gcc moby-stack.c
[~] ulimit -v 16384
[~] ulimit -s 32768
[~] ./a.out
Segmentation fault (core dumped)

Examination of the core file shows it created 895 stack entries, for a total of
895 * 8 * 2048 or 14,663,680 or so bytes of stack. That's about what you'd
expect given a virtual limit of 16M, and a bit of space taken up by shared
libraries and so on. Re-running with a virtual limit of 32768 lets it get 1916
stack entries deep. Again, that's about where you'd expect the explosion to
happen with that virtual limit in place. So regardless of the stack limit, it
blows up when it gets to the -v limit.

Conclusion: The stack *is* included in the -v virtual limit.

> - Is there a way, in user space, to force the stack to be allocated at a
> certain address?

Sure.  mmap() yourself a chunk of anonymous pages at the address you want,
and call a little assembler stub that sets your registers.  You'll probably have
to ensure you allocate enough pages, or handle expanding onto new pages
yourself (including making sure there's free pages to grow into).  And of 
course,
you're basically stuck with that stack, returning to older functions on the old
stack is going to be *really* hard.  So you'll probably want to do that in 
main().

Google for 'linux green threads' for the gory details.  Java used them on Linux 
until 2017
or so.

> - Will the patch above (or similar) ever be integrated into the Linux
> kernel?

See above.  It's been in the kernel for a decade.


pgpruEvoYwKjm.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Best practice to upgrade kernel

2019-04-26 Thread Valdis Klētnieks
On Fri, 26 Apr 2019 11:17:24 -0700, Xiaolong Wang said:

> I have a newbie question. I’m using a 3.18.66 kernel branch from Qualcomm.
> Over the year I’ve applied some of patched here and there manually.

> But It seems Qualcomm does not patch lot of newly released vulnerabilities. I
> would love to upgrade to the latest 3.18.138 (not 4.x)

> What is the best practice here? 

Since you're being forced by Qualcomm to use 3.18, take it up with them.

What's this? You say you're not forced to do it?  Well, there's two things you
can do:

1) Continue feeling the pain of running an old kernel
2) Fix the reason you're on 3.18, and get on a 4.x or 5.x kernel.

Your choice.


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: how to determine whether the source code is same between two kernels

2019-05-08 Thread Valdis Klētnieks
On Wed, 08 May 2019 16:52:46 +0800, wuzhouhui said:
> Suppose I have two kernels, one is A.B.C build by people Tom. And
> the other is A.B.C build by Jerry. The source code have been deleted
> after kernel is build and installed. Now I want to know whether the
> source code of these two kernel is the same (even if they have the same
> name). All I have is binaries (e.g. vmlinux, config, *.ko, System.map).
> Is it possible?

The problem is that I can build my kernel with gcc5 (or even a gcc4.mumble),
and the binaries are going to be different than what the same exact source tree
produces with gcc 9.1.1.

Of course, the *correct* solution is to hold both Tom and Jerry to the
requirements of the GPL, and force them to give you the kernel source trees
that went into those kernels that they distributed, and then compare the trees.

What? You say neither one can actually do so? Then why are you accepting and
using kernels from them, rather than shunning them as the GPL violators they
are?



pgpGIyPcTcXhy.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: __clksrc_of_table and "clocksource_probe: no matching clocksources found"

2019-04-23 Thread Valdis Klētnieks
On Tue, 23 Apr 2019 17:07:14 -0300, Guilherme Costa said:

> From the code, we can see that the message was printed because
> clocksources == 0. This implies that there were no matching nodes on
> the device tree,

Right.

> and that acpi_probe_device_table returned 0 (which is correct, seeing
> that the kernel has no ACPI support).

> This leaves me with two questions:
> 1 - Is this message a problem indicator? I'm assuming it is, because
> it's printed with a pr_crit...

Depends.  Most systems really want a clocksource of some sort.

> 2 - Why is for_each_matching_node_and_match not getting any matches? I
> did not find where __clksrc_of_table is initialized, so maybe it's
> because it is empty?

My first guess is that your hardware has a busted device tree that doesn't
have an entry pointing at any actual valid hardware clock device.  Could
be an incorrect address, or the DT says it's a Frobozz1 clock but the actual
hardware has a Frobozz2 that won't probe using the Frobozz1 driver, or
something else

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Partition linux page cache for each VM

2019-04-25 Thread Valdis Klētnieks
On Fri, 26 Apr 2019 03:01:52 +0530, Chirag Chauhan said:

> I have selected this as a course project for a course on virtualization.
> Doing this will reduce the performance interference between virtual
> machines.

Actually, it can make it worse.

Consider the behavior of 4 VMs sharing 16G of page cache, where
3 VMs want 4.5G of cache each and 1 wants 2G.

Now consider the behavior of 4 VMs each with a guaranteed 4G of page
cache, where 3 VMs want 4.5G of cache each and 1 wants 2G.

Then perform the same analysis for your LRU list, and make sure it actually
provides a benefit before you start coding.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Driver for t2uh mt7610e/mt7610e (was Re: Frank

2019-08-03 Thread Valdis Klētnieks
On Sat, 03 Aug 2019 10:57:15 -0400, Pastor Frank Ernesto Ram�rez Torres said:

> Hi, I downloaded the Linux 5.2.5 kernel from your website. I have a tplink
> t2uh that I would like to operate on the Debian 9.0 system that I have
> installed. Note that there is an option in the configuration to compile the
> mt7610u driver. But after enabling it in .config, compile it, install it
> and connect the t2uh nothing happens. The dmesg tells me that it tries to
> load the mt7610e first but he does not find it. It changes for mt7610u and
> when trying to load it, error -2 occurs

The first thing missing from this info is any messages from dmesg from when the
driver loads and tries to initialize.

Also, if this is a USB device, plug it in and do an 'lsusb' and provide the
output so we can see what the USB device ID is.  There's a very real
possibility that one of the mt7610 drivers supports the chipset in the device,
but nothing tries to claim it because none of the drivers have that device ID
in their tables.

Also, I fixed the Subject: line for you - the original was uninformative to the
point of looking spammy...



pgpklaaBb9wHu.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to handle page allocation when memory exceeds a local node

2019-08-22 Thread Valdis Klētnieks
On Thu, 22 Aug 2019 17:41:22 +0900, Won-Kyo Choe said:

> In my perspective, if the kernel starts to allocate in the remote node,
> I think the scheduler should move the process to the remote node and it
> will allocate a page in the remote node at first in the loop (in the
> process view, the node would be the local now since it is moved). Would
> the scheduler do that?

That's not the scheduler's job to do that.  Plus... what do you do about the
case where a process already has 12G of memory on one node, that node runs out
and 1 4K page gets allocated on another node.  Which is better, move the 12G,
or every once in a while try to relocate that 1 4K page to a better node?



pgpnbWjXmu3eS.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Staging/netlogic coding style issues with struct

2019-09-03 Thread Valdis Klētnieks
On Tue, 03 Sep 2019 01:26:17 -0300, Pablo Pellecchia said:

> *WARNING: struct  should normally be const#9: FILE:
> platform_net.h:9:+struct xlr_net_data {*
>
> A similar issue is reported when we declare a variable of type struct
> , but in this case warning is reported on the struct definition
> itself.
>
> How can we fix this?

And in today's "How to debug checkpatch" lesson.. :)

First, figure out if checkpatch is in fact correct. It' just a Perl script,
and has no real idea of what the code is.

And double-checking, there's very few 'const struct' declarations in
include/linux/*.h.

So what's going on?  Good question. Actually looking at checkpatch.pl,
we find:

# check for various structs that are normally const (ops, kgdb, device_tree)
# and avoid what seem like struct definitions 'struct foo {'
if ($line !~ /\bconst\b/ &&
$line =~ /\bstruct\s+($const_structs)\b(?!\s*\{)/) {
WARN("CONST_STRUCT",
 "struct $1 should normally be const\n" . 
$herecurr);
}

and $const_structs is initialized from scripts/const_structs.checkpatch 
And that tells us 2 things:  First, this should only be triggering for 
structures
that are listed in that file, and the message *should* say something
like 'struct foo should normally be const', with $1 filling in the struct name.

So why is $1 not showing up? Damned good question.  And the file
checks just fine for me.

[/usr/src/linux-next]2 scripts/checkpatch.pl -f 
drivers/staging/netlogic/platform_net.h
total: 0 errors, 0 warnings, 0 checks, 21 lines checked

drivers/staging/netlogic/platform_net.h has no obvious style problems and is 
ready for submission.

Bingo!  This is what happens if the permissions on the file are messed up
and it can't read the file:

[/usr/src/linux-next] scripts/checkpatch.pl -f 
drivers/staging/netlogic/platform_net.h
No structs that should be const will be found - file 
'/usr/src/linux-next/scripts/const_structs.checkpatch': Permission denied
WARNING: struct  should normally be const
#9: FILE: drivers/staging/netlogic/platform_net.h:9:
+struct xlr_net_data {

So... you probably need to check the permissions, or if the file is missing
from your tree or empty or something. The version in my tree is 64 lines long.

Meanwhile, I'm going to go cook up a patch for this




pgpo9lQpFIa9z.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: [OSSNA] Intro to kernel hacking tutorial

2019-09-02 Thread Valdis Klētnieks
On Mon, 02 Sep 2019 15:42:19 +0300, Anatoly Pugachev said:

> is it intentionally that you use
>
> yes "" | make oldconfig
>
> instead of
>
> make olddefconfig

They do something different.  'olddefconfig' just takes the platform or
architecture defconfig and updates it for any new CONFIG_* variables added
since the last time the defconfig was updated in the tree.

yes "" | make oldconfig  does the same updating for new CONFIG_* variables, but
starts with the most recent .config - which produces wildly different results
if the .config had previously been minimized by 'make localmodconfig' or other
similar techniques.



pgpdroOMugKBv.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: checkpatch.pl CHECK alignment

2019-09-02 Thread Valdis Klētnieks
On Mon, 02 Sep 2019 16:17:00 -, "XenoN. w0w" said:

> Hello all, I am running checkpath.pl against one driver and I’m getting
> CHECK: Alignment should match open parenthesis

> Are these lines worth of changing, what are the odds of that patch being
> accepted. I’m new and I want to contribute so sorry for dumb question

Digression:  If you want to contribute, you should probably first read this:
https://lists.kernelnewbies.org/pipermail/kernelnewbies/2017-April/017765.html

Depends where in the kernel tree it is.

If it's not under drivers/staging, a lot of maintainers won't take the patch for
several reasons:

1) Most of the kernel tree is actually pretty stable and not being worked on,
and there's always a non-zero chance of a fix-the-formatting patch being bad
and changing semantics.  It's rare but does happen.

2) Parts of the kernel are being actively worked on, and formatting patches
can introduce merge conflicts, which usually make maintainers cranky.

3) And long-term, it messes up the output of 'git blame' - rather than showing
you the commit that changed a function call from 3 parameters to 4, now it
shows the commit that moved some spaces around.  This tends to make
developers cranky.

If it *is* under drivers/staging, the patch will probably be accepted.  
However, the
fact that it's under drivers/staging means there's probably several metric tons 
of
stuff that needs fixing, and alignment of continued lines is the least of its 
problems.

(Just for the record, the exfat patch *was* both sparse and checkpatch clean 
except
for line-too-long warnings, and look at the long list of stuff I still need to 
fix :)

Digression 2:
From: "XenoN. w0w" 

That's not going to get accepted on a patch - see section 11 of
Documentation/process/submitting-patches.rst


pgpqrfbKdGEmM.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: is there a macro symbol for strlen($LINUX_SOURCE_PATH)

2019-09-05 Thread Valdis Klētnieks
On Thu, 05 Sep 2019 08:31:55 -0600, jim.cro...@gmail.com said:

> So I feel compelled to offer a fix for dynamic_debug, attached.
> hopefully it explains adequately, I have some doubts..
>
> maybe this should go to LKML now,
> but I guess Id prefer to make my obvious thinkos less publicly.
> Im happy to bikeshed the commit-msg or code.

You should find a way to test that this is TRTTD for all gcc releases still
supported for building a kernel (which may mean finding a 4.8 or 4.9 to
test on to see if it uses relative or full paths).

Removing these functions for kernels built with pre-change gcc will cause some
semantic changes. Probably the *right* thing to do is to figure out what
release it was changed in, and do some hacking to include/config/compiler-gcc.h.

In addition, any such patches should be at least non-hostile to the ongoing
effort to get a kernel tree that builds with clang rather than gcc.



pgpAbc_HGw3mW.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: perf_event wakeup_events = 0

2019-09-07 Thread Valdis Klētnieks
On Sat, 07 Sep 2019 09:14:49 -0700, Theodore Dubois said:

> If I’m reading this right, this is a sampling event which overflows 4000
> times a second. But perf then does a poll call which wakes up on this FD with
> POLLIN after 1.637 seconds, instead of 0.00025 seconds.

No, it *takes a sample* 4,000 times a second.  For instance, number of cache 
line
misses since the last sample.  You get an overflow when the counter wraps 
because
there have been more than 2^32 events since you read the counter.

At least that's my understanding of it.


pgpQd0SkQ86ds.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: perf_event wakeup_events = 0

2019-09-07 Thread Valdis Klētnieks
On Sat, 07 Sep 2019 09:14:49 -0700, Theodore Dubois said:

Reading what it actually says rather than what I thought it said.. :)

   Events come in two flavors: counting and sampled.  A counting event  is
   one  that  is  used  for  counting  the aggregate number of events that
   occur.  In general, counting event results are gathered with a  read(2)
   call.   A  sampling  event periodically writes measurements to a buffer
   that can then be accessed via mmap(2).

For some reason, I was thinking counting events.  -ENOCAFFEINE. :)

> sample_freq is 4000 (and freq is 1). Here’s the man page on this field:
>
>sample_period, sample_freq
>   A "sampling" event is one that generates an  overflow  
> notifica‐
>   tion  every N events, where N is given by sample_period.  A 
> sam‐
>   pling event has sample_period > 0.

There's this part:
>   pling event has sample_period > 0.   When  an  overflow  occurs,
>   requested  data is recorded in the mmap buffer.  The sample_type
>   field controls what data is recorded on each overflow.

So an entry is made in the buffer. It's not clear that this immediately triggers
a signal...

   MMAP layout
   When using perf_event_open() in sampled mode, asynchronous events (like
   counter overflow or PROT_EXEC mmap tracking) are logged  into  a  ring-
   buffer.  This ring-buffer is created and accessed through mmap(2).

   The mmap size should be 1+2^n pages, where the first page is a metadata
   page (struct perf_event_mmap_page) that contains various bits of infor?
   mation such as where the ring-buffer head is.

So you need to look at what size mmap buffer is being allocated.  It's 
*probably*
on the order of megabytes, so that you can buffer a fairly large number of 
entries
and not take several user/kernel transitions on every single entry...

> If I’m reading this right, this is a sampling event which overflows 4000 
> times a second.

And 4,000 entries are made in the buffer per second..

> But perf then does a poll call which wakes up on this FD with POLLIN after
> 1.637 seconds, instead of 0.00025 seconds

At which point perf goes and looks at several thousand entries in the ring 
buffer...


pgpjcow8NOLLS.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: perf_event wakeup_events = 0

2019-09-07 Thread Valdis Klētnieks
On Fri, 06 Sep 2019 16:28:24 -0700, Theodore Dubois said:
> The man page for perf_event_open(2) says that recent kernels treat a 0
> value for wakeup_events the same as 1, which I believe means it will
> notify after a single sample. However, strace on perf(1) shows that it
> uses wakeup_events=0, and it's definitely not waking up on every
> sample (it seems to be waking up every few seconds.)
> tools/perf/design.txt says "Normally a notification is generated for
> every page filled". Is the documentation wrong, or am I
> misunderstanding something?

   wakeup_events, wakeup_watermark
  This  union sets how many samples (wakeup_events) or bytes 
(wakeup_watermark) happen before an overflow
  notification happens.  Which one is used is selected by the 
watermark bit flag.

  wakeup_events counts only PERF_RECORD_SAMPLE record types.  To 
receive overflow  notification  for  all
  PERF_RECORD types choose watermark and set wakeup_watermark to 1.

  Prior  to Linux 3.0, setting wakeup_events to 0 resulted in no 
overflow notifications; more recent ker?
  nels treat 0 the same as 1.

My reading of that is that in pre-3.0 kernels, you could choose to not get 
overflow
notifications, and now you'll get them whether or not you wanted them.

Under "overflow handling", we see:

   Overflows are generated only by sampling events (sample_period must have 
a nonzero value).

So the reason strace says perf is only waking up every few seconds is probably
because you either launched perf with options that only create trace events, or
it takes several seconds for an overflow to happen on a sampling event. A lot
of those fields are u64 counters, and won't overflow anytime soon.  Even the
u32 counters can take a few seconds to overflow



pgpms1xS3wOCY.pgp
Description: PGP signature


Re: newbie-level question about cgroups

2019-07-28 Thread Valdis Klētnieks
On Sun, 28 Jul 2019 15:39:26 -0400, "Robert P. J. Day" said:

>   no point bugging the actual cgroup people about this since it should
> be simple ... if i need *only* cgroup v2, can i dispense entirely with
> everything under /sys/fs/cgroup/ other than /sys/fs/cgroup/unified?

There's a whole mess of CONFIG_CGROUP_* variables, feel free to turn off
those that your system doesn't actually need.

Make sure that you keep a backup kernel in case you turn off something you
shouldn't have.  In particular, systemd seems to want all sorts of cgroups for
no reason other than "because they're there".


pgpPWA8ULakAz.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: wait_event()\ wait_event_interruptible()\ wait_event_interruptible_timeout() and wake_up()

2019-07-29 Thread Valdis Klētnieks
On Mon, 29 Jul 2019 22:48:57 +0530, Muni Sekhar said:
> Let us assume that multiple processes are waiting on wait_event()\
> wait_event_interruptible()\ wait_event_interruptible_timeout(), which
> process gets woken up on calling wake_up()??
>
> I presume wake_up() picks one process, but is there any algorithm to
> pick which process?

Hint:  If you have more than one process waiting, and they do the same thing
(think multiple copies of the same kthread), it probably doesn't matter.

If they do different things and which one gets picked matters for correctness,
you're doing it wrong and probably need some locking.

If they do different things and the results will be correct no matter which
order they're picked, but you want one to go first for latency/througput
considerations, you have a scheduling/priority issue and probably need to fix
it using the vast plethora of knobs and tools available for that purpose.



pgp5hJrc6U8Aq.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: wait_event()\ wait_event_interruptible()\ wait_event_interruptible_timeout() and wake_up()

2019-07-29 Thread Valdis Klētnieks
On Mon, 29 Jul 2019 23:37:34 +0530, Muni Sekhar said:
> On Mon, Jul 29, 2019 at 11:31 PM Bharath Vedartham  
> wrote:
> > Sorry to spoil the fun here. But check out what the queue data structure
> > is all about. 'wait_queue' :)
> A wait queue is a doubly linked list of wait_queue_t structures that
> hold pointers to the process task structures of the processes that are
> blocking. Each list is headed up by a wait_queue_head_t structure,
> which marks the head of the list and holds the spinlock to the list to
> prevent wait_queue_t additional race conditions

So... if you're picking the first entry off a linked list, how do you ensure 
that
the one you want run is the one that gets picked?

Make sure the right one is at the head of the list, of course. ;)



pgpVhyQ7R3cRS.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: wait_event()\ wait_event_interruptible()\ wait_event_interruptible_timeout() and wake_up()

2019-08-06 Thread Valdis Klētnieks
On Tue, 06 Aug 2019 14:15:53 +0530, Muni Sekhar said:

> If they do different things, does the waiting threads wakeup order is
> the same in which they were enqueued (FIFO order)?

You missed the point - if you have (for example) something that's waiting for
mouse I/O and something that's waiting for a network packet, under what
conditions will they be using the same wait event?


pgpvg0JiiFRJN.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Query TCP states/connection tracking table in Linux Kernel Module

2019-09-19 Thread Valdis Klētnieks
On Thu, 19 Sep 2019 06:12:46 -, Yadunandan Pillai said:
> I'm developing a proxy system for TCP handshakes.

The programmer's version of "I'm writing the Great American Novel". :)

> However, I'm unable to find a way to verify an incoming ACK packet.

That will depend on exactly what you mean by "verify". Are you just concerned
with the TCP 4-tuple (source/dest port, source/dest address)?  Or are you also
checking that things like the sequence number match?  (Bonus points for doing
the right thing on a kernel that has syncookies enabled, and still work 
correctly
if syncookies aren't in use)

> I then ensure that they don't have a payload (therefore , confirming it is a
> handshake packet with ACK flag.

Note that ACK packets with no payload don't mean they're handshake packets.

Look at any FTP transfer - you'll see packets going one way with data, and
just ACK going back the other way.

You need both SYN and ACK for it to be a handshake packet.

iptables --tcp-flags SYN,ACK,... ACK  <- isn't doing what you think it does.

I'm wondering if you may be in over your head on this one...


pgpmdtvMdoumM.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Hello, does anyone know any university that has lines of research on the linux kernel

2019-09-28 Thread Valdis Klētnieks
On Sat, 28 Sep 2019 12:45:11 -0600, Manuel Quintero Fonseca said:
> Hello, does anyone know any university that has lines of research on
> the linux kernel

Well.. most of the actual code development is being done out in industry
and by individuals.  The stuff that happens in universities is usually more
theoretical (new concepts in memory management, etc), and merely *uses*
Linux as a platform because it's available.  Pretty much nobody is doing
any research *on* the Linux kernel as itself (unless it's as a case study in
managing large scale software development, or as a data point for code
quality metrics and other such things).

And there's a difference between "University ABC has a professor who's got this
one project that happens to use Linux in it" and "University DEF has 4
professors and 20 grad students who have set up an official Center For
Something Research".  So if you're looking for grad schools, you want to be
looking at things with longevity, like the MIT Media Lab, or Purdue's computer
security expertise, or a lot of the stuff being done at CMU or Stanford or
Berkeley.  It sucks to transfer to a grad school for 3 years, only to have the
project you transferred for go away a year later

(And many of those projects never see the light of day, because they often end
up being some variant of "If we measured metric X better, we could do a better
job of predicting what to do with Y" - but it often turns out that measuring X
better costs more than the added efficiency of Y gains you)



pgpwayf1izhk0.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Generating Log of Guest Physical Addresses from a Kernel Function and Perform Analysis at Runtime

2019-09-26 Thread Valdis Klētnieks
On Thu, 26 Sep 2019 20:05:57 +1000, Brock said:
> On Thu, 2019-09-26 at 15:45 +0900, Sahibzada Irfanullah wrote:
> > Thank you very for your help. I have checked the ftrace, and perf. I
> > think it won't work for me.I am not analyzing/tracing the the kernel.
> > I want to develop my own dynamic tool like Pin Tool  (or moudule
> > which can be loaded/unloaded at run time dynamically), so that I can
> > easily tune/modify for different purposes , and to get any type of
> > specific information from the kernel/KVM, espcecially in the context
> > of virtualization (guest and/or host memory management). That's why I
> > take a start from generating the log of guest physical addresses from
> > the kernel by saving it to the file; with the passage of time, I will
> > add the funcationalities to it.
> > Thank you.
> I'm not sure if it's hardware addresses but you can get kernel/user/kvm
> page fault information with:

I admit I'm still mystified by the requirement for the hardware address rather 
than
the virtual address, when doing any sort of analytics is going to require 
mapping
back to a process and virtual address - and possibly incorrectly (consider a 
page fault
from an instruction in a shared library that's mmap'ed into 250 running 
processes, like
glibc...)


pgpSURVsiTZWj.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Fwd: Need Help regarding Reading and Writting to a file from kernel function file

2019-09-24 Thread Valdis Klētnieks
On Tue, 24 Sep 2019 16:10:07 +0900, Sahibzada Irfanullah said:

> subject line. The problem is:  I am trying to write/read page faulted
> physical addresses to a file in a kernel (v5.3-1) function, i.e.,
> handle_ept_violation() which is present in vmx.c. I have followed this
> ,

Just because somebody on stackoverflow gave a guide doesn't mean it's
a good idea.

What problem are you trying to solve here?  Are you trying to write the faulted
pages themselves to a file?  In that case, just creating the file, using 
something like
'dd if=/dev/zero of=/your/file/here bs=1M count=4096' and then using mkswap
and swapon will probably work much better.

If you're trying to produce a trace of what pages are being faulted, you can
probably do a better job by using 'perf' to produce trace events with a lot of
added data for you, or use debugfs or netlink and a userspace program to read
the data and write it to disk from userspace.



pgp3G81F51Lvk.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Generating Log of Guest Physical Addresses from a Kernel Function and Perform Analysis at Runtime

2019-09-24 Thread Valdis Klētnieks
On Tue, 24 Sep 2019 19:10:59 +0900, Sahibzada Irfanullah said:

> My actual goal is to generate log of physical addresses for different
> applications by writing them into the file, and then perform some analysis

What makes you think that the log of physical addresses will tell you anything
useful or interesting?

Hint:  Pretty much all physical pages usable for process space are identical as 
far
as the kernel is concerned, and if a virtual or disk cache page is pulled into 
memory
from disk more than once, the same virtual page can end up in different physical
pages.each time.

> at runtime in this function by reading the logs from the log file.
> Furthermore, I want a file which size can dynamically grow as the size of
> log increases.

Are you trying to do this *at runtime in real time*, or is post-run analysis OK?



pgpp4xUJDtPHr.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Do I need strong mathematical bases to work in the memory subsystem?

2019-09-29 Thread Valdis Klētnieks
On Sun, 29 Sep 2019 17:48:43 -0500, CRISTIAN ANDRES VARGAS GONZALEZ said:

> Hello good morning, to be developed from the kernel do I need to have good
> math bases? I want to help in the ram memory subsystem and I have that
> doubt thank you.

Depends what you mean by "strong math basics".  You'll *definitely* need to
understand decimal/hexadecimal/binary/octal and how to convert between
them. Understanding algebra is useful.

If you've had some intro to complexity theory so you understand why an O(N^2)
algorithm is usually worse than one that's O(N log N), that helps. Also,
knowing enough computing theory to understand what a finite state machine is,
and why to use one, and how to write code to implement one, is useful.

You *probably* don't need calculus or deep number theory or a lot of other
pure math.

Programming in the kernel doesn't require any more math than what's required
for competent programming in userspace.


pgp9xfAPMFF_z.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Generating Log of Guest Physical Addresses from a Kernel Function and Perform Analysis at Runtime

2019-09-24 Thread Valdis Klētnieks
On Tue, 24 Sep 2019 20:26:36 +0900, Sahibzada Irfanullah said:

> After having a reasonable amount  of log data,

If you're trying to figure out how the kernel memory manager is working, you're
probably better off using 'perf'  or one of the other tracing tools already in
the kernel to track the kernel memory manager. For starters, you can get those
tools to give you things like stack tracebacks so you know who is asking for a
page, and who is *releasing* a page, and so on.

Of course, which of these tools to use depends on what data you need to answer
the question - but simply knowing what physical address was involved in a page
fault is almost certainly not going to be sufficient.

> I want to perform some type of analsys at run time, e.g., no. of unique
> addresses, total no. of addresses, frequency of occurences of each addresses
> etc.

So what "some type of analysis" are you trying to do? What question(s)
are you trying to answer? 

The number of unique physical addresses in your system is dictated by how much
RAM you have installed. Similarly for total number of addresses, although I'm
not sure why you list both - that would mean that there is some number of
non-unique addresses.  What would that even mean?

The number of pages actually available for paging and caching depends on other
things as well - the architecture of the system, how much RAM (if any) is
reserved for use by your video card, the size of the kernel, the size of loaded
modules, space taken up by kmalloc allocations, page tables, whether any
processes have called mlock() on a large chunk of space, whether the pages are
locked by the kernel because there's I/O going on, and then there's things like
mmap(), and so on.

The kernel provides /proc/meminfo and /proc/slabinfo - you're going to want
to understand all that stuff before you can make sense of anything.

Simply looking at the frequency of occurrences of each address is probably not
going to tell you much of anything, because you need to know things like
the total working and resident set sizes for the process and other context.

For example - you do the analysis, and find that there are 8 gigabytes of pages
that are constantly being re-used.  But that doesn't tell you if there are two
processes that are thrashing against each other because each is doing heavy
repeated referencing of 6 gigabytes of data, or if one process is wildly 
referencing
many pages because some programmer has a multi-dimensional array and is
walking across the array with the indices in the wrong order

i_max = 4095; j_max = 4095;
for (i = 0, i < i_max; i++) for j = 0, j < j_max; j++) {sum += foo[i][j]}

If somebdy is doing foo[j][i] instead, things can get ugly.  And if you're
mixing with Fortran code, where the semantics of array references is reverse
and you *want* to use 'foo[j][i]' for efficient memory access, it's a bullet 
loaded
in the chamber and waiting for somebody to pull the trigger.

Not that I've ever seen *that* particular error happen with a programmer
processing 2 terabytes of arrays on a machine that only had 1.5 terabytes of
RAM.  But I did tease the person involved about it, because they *really*
should have known better. :)

So again:  What question(s) are you trying to get answers to?



pgp7QU9hFsejM.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Linux stable: 4.19 vs 4.14

2019-11-02 Thread Valdis Klētnieks
On Fri, 01 Nov 2019 14:24:26 -0700, Avinash Patil said:
> Hi Greg,

I'm not Greg, but... :)

> I am curious as to why Linux4.19 which was released later has earlier
> EOL than 4.14?

Not all stable releases are kept going for the same amount of time.  Most go
EOL as soon as a few newer releases have come out, while every 5th one or so is
kept going for longer.

> If we have to choose one version over another for BSP, which one is preferred?

If you're planning to dump unsupported crap on customers, it doesn't matter.
Let's face it - if you're not going to provide updates, when a stable stream
EOLs doesn't matter if you ship 4.19.81 or 4.14.151, because your customers
aren't ever going to get 4.19.104 or 4.14.183.

But you probably want to base the BSP on 4.19 so that your customers get the 
benefit
of all the stuff that got fixed between 4.14 and 4.19.  Remember that only a 
*very* small
fraction of fixes - those that qualify under 
Documentation/process/stable-kernel-rules.rst
get included in the stable tree.

And of course, unless you have no intention of building similar boards in the 
future,
it's a good idea to upstream any custom drivers.  That way, when your follow-on
BSP gets based to the 5.11 kernel, your drivers are already in-tree, and even 
more
importantly, already updated to any 5.11 kernel API changes, because anybody who
changed a kernel API was required to update your driver for you.

(And no, "We only plan to sell 50,000 so it's not worth it" is not a valid 
excuse.  There's
plenty of stuff that's in-tree that's very niche with only a few users.  Heck, 
we kept
an entire architecture (the i386 Voyager) around for 2 machines.  Not two 
models,
two physical machines.  We finally dropped it when James Bottomley was unable to
mix-and-match parts from the two machines to get either one to boot)



pgpOVNBdH8Cea.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: [PATCH 1/1] syscalls: Fix references to filenames containing syscall defs

2019-11-04 Thread Valdis Klētnieks
On Mon, 04 Nov 2019 18:00:05 -0500, Mohammad Nasirifar said:
> Fix stale references to files containing syscall definitions in
> 'include/linux/syscalls.h' and 'include/uapi/asm-generic/unistd.h',
> pointing to 'kernel/itimer.c', 'kernel/hrtimer.c', and 'kernel/time.c'.
> They are now under 'kernel/time'.
>
> Also definitions of 'getpid', 'getppid', 'getuid', 'geteuid', 'getgid',
> 'getegid', 'gettid', and 'sysinfo' are now in 'kernel/sys.c'.
>
> Signed-off-by: Mohammad Nasirifar 
> ---
>  include/linux/syscalls.h  | 8 
>  include/uapi/asm-generic/unistd.h | 8 
>  2 files changed, 8 insertions(+), 8 deletions(-)

This patch looks sane, correct, and properly formatted. All in all,
a good first patch. :)

Feel free to add this when you submit it:

Acked-by: Valdis Kletnieks 

As far as where to send it?

Looking at the output of get_maintainer.pl for those two files, I'd use:

To: Arnd Bergmann , Andrew Morton 
cc: linux-...@vger.kernel.org, linux-a...@vger.kernel.org, 
linux-ker...@vger.kernel.org

(Arnd as maintainer, Andrew because he's well-known as having a soft spot for
trivial patches, and the three lists because they're relevant lists.  I skipped 
all the
BPF people because although they've done commits to those files, this isn't
really BPF related.)


pgp1xJdenBUZ3.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: [PATCH] hwmon: (dell-smm-hwmon) Disable BIOS fan control upon loading

2019-11-12 Thread Valdis Klētnieks
On Tue, 12 Nov 2019 22:41:28 +0100, Giovanni Mascellani said:

> Disable BIOS fan control on such laptops at module loading, and
> re-enable it at module unloading, so that a userspace controller like
> i8kmon can take control of the fan.

The code as written appears to make the disabling unconditional,
which means that if a userspace controller *isn't* started things will
get toasty really fast. Probably need a module parameter or something
to control that.



pgp3LzvtRGguB.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Netlink socket returns NULL in vmx.c kernel file

2019-11-05 Thread Valdis Klētnieks
On Tue, 05 Nov 2019 17:59:43 +0900, Irfan Ullah said:

> Thank you for the response.
> Attached are the files for kernel-user spaces communication.

>   //when I remove this wait the code does not work
>   msleep(3000);

If your code doesn't work, but sticking in random delays makes
it start working, you almost certainly have a race condition or
synchronization issue that should be fixed using proper locking.

> void hello_exit(void)
> {
>   //netlink_kernel_release(nl_sk);

Congratulations. You just leaked a socket here, which is going to
make it difficult to use that socket until you either reboot or find a
way to close it properly before trying to create it again.

> (code generates some warnings, but it is not severe and could be ignored for 
> the time being).

You should do the following:
1) Understand your code well enough so you understand *why* the compiler
issued the warning.
2) Correct your code so the compiler doesn't complain. It almost certainly
understands C better than you do.

gcc 9.2.1 emits one warning on the kernel module code at default warning
levels.  And it's one you *really* need to fix, because it indicates that you
and the compiler are not on the same page about what types your variables are.
Since it's going to go ahead and generate code based on what types *it* thinks
your variables are, you will have nothing but pain and anguish debugging if
you thought they were some other type

In fact, you may want to compile the kernel module with 'make W=1' to get more
warnings.  If your system has sparse, you should use 'make W=1 C=1'.

And all the warnings this generates are things that shouldn't be seen in clean
kernel code.

I didn't bother looking closely at your userspace.  I gave up
when I saw this:

 14 int sock_fd;
(...)
 68 void user_space_receiver()
 69 {
(...)
 96 user_space_receiver(sock_fd);

There's 2 basic ways to pass a variable to a function. You're trying
to use both of them here.  Pick one and use it properly.

Oh - and there's no possible way to reach the close(sock_fd); on line
77, because the rest of the function infinite loops without a break.  At the
very least, you should be checking the return code from recvmsg() and
exiting the while(1) loop if there's an issue.

Bottom line:  You need to get a *lot* more experience writing proper
C code in userspace before you try to write kernel code.


pgpeaMcaVIlCt.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to get the preprocessor output as part of the compilation process?

2019-12-08 Thread Valdis Klētnieks
On Sun, 08 Dec 2019 14:05:06 -0500, "Frank A. Cancio Bello" said:

> I know that with gcc -E you can get the output of the preprocessor, but what
> I have to do to get that output for every source code file in the Linux Lernel
> as part of the compilation process?

What problem are you trying to solve by doing that?  There's probably a
better and more efficient approach


pgpfTIHO2AFYe.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Free RAM in Linux .

2019-12-16 Thread Valdis Klētnieks
On Tue, 17 Dec 2019 10:39:08 +0530, Neel chakraborty said:

> Does Linux use all of the physical memory (RAM) I have ? In both the
> outputs of /proc/meminfo and free -h , shows that 1.4 gigs is used and 1.6
> gigs is cached , and the rest is "free" out of 32 Gigs . The available ram
> is the cached ram + reclaimable ram + free ram , right ?

That probably means that the processes you have running use a total
of 1.4G of ram, and you've referenced 1.6G of files on disk.

The rest is free because you've not done anything to give the system even
a hint of what to do with the other 27G of RAM.

If you reference a whole bunch of files (find /usr -type f | xargs cat) > 
/dev/null
or other similar), you'll see more gigs used for cache.

If you run a few large processes, like a Chrome with 90 tabs open, you'll
see the other number go up.

> And also , does the linux kernel use the amount of ram which is not used by
> applications as paging cache ? Say I have 4 gigs of ram , and Firefox is
> using 1 gig of it , the rest of RAM is used for disk/page caching or is it
> just unused and left there ?

The kernel itself will use some of it, other processes will use some of it, and
if there's any left, it will be used for disk caching - but not until a process
has actually referenced data off the disk.


pgpw3Mdwvmt7r.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Missing FDT description in DTB

2019-12-06 Thread Valdis Klētnieks
On Fri, 06 Dec 2019 10:03:34 +0100, Tomek Domek said:

>  And this uboot and spl is somekind of experimental software which is in
> the middle of creation. Could anyone try to guide what might be possible
> the reason of the issue as I am a bit new in u-boot development?

Is there a reason why you're using an experimental uboot/spl rather than
a known-stable working version for whatever hardware this is?

(Of course, if this is bring-up of a new architecture that has never been
supported by uboot, so there's never been a  working uboot for the device, it's
time to quote the movie Animal House: "My advice to you is to start drinking
heavily")



pgpRj4Q9IAjno.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to get the preprocessor output as part of the compilation process?

2019-12-09 Thread Valdis Klētnieks
On Mon, 09 Dec 2019 13:10:11 +0300, Konstantin Andreev said:
> The universal approach that always works in this and many similar cases is 
> just
> to replace the instrumented binary by your interception shell script.

> E.g. rename gcc to gcc.hide (generally, moving into another location may not
> work) and setup 'gcc' script that does what you want: replaces `-c' with the
> `-E', replaces `-o' argument, etc ..., calls gcc.hide to preprocess source 
> then
> calls gcc.hide with original non-modified command line.

> This is cumbersome process, you can break some things,

And in fact, what you may want to do is have your script invoke gcc
*twice*, once with -E, and then a second time with -c, because otherwise
the build will die the first time it tries to link together two or more 
non-existent
.o files.

Using 'make -k' *might* also work, but will leave the build log output littered
with a *lot* of error messages.

Or explain why you're doing this - there may be a simpler way to achieve
your goal. For instance, if you're trying to build a cross-reference of what
.c files include what .h directly or indirectly, there's already specialized 
tools
for doing that sort of thing, such as 'cxref'.


pgpVHBxcTz0AV.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Can't seem to find a maintainer for init/* files

2019-10-21 Thread Valdis Klētnieks
On Tue, 22 Oct 2019 01:37:33 +1300, Paulo Miguel Almeida said:

> In a second approach:
> I tried making "git log" to list all the commits this particular file was 
> involved in
> (so I could use --follow) but I ended up with loads of commits that change 
> other sections
> of the file (not the lines we were looking for) and because of that I didn't 
> feel like I
> was meeting the "'git blame' to show you the original 6 commits rather than 
> the
> cleanup" rule.

Hint 1:  'git log' supports --no-merges which can simplify things sometimes.

Hint 2:   When specifying a commit, there is a ~ operator that can be used to 
advantage here.

So once you figure out which commit did the whitespace patching, it's easy to 
get
git blame to cough up what the tree looked like just before.


pgpOioUjTV4ly.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel Panic

2019-10-15 Thread Valdis Klētnieks
On Tue, 15 Oct 2019 07:21:18 -, Christophe DUMONT said:

> We're facing Kernel Panic on CentOS 7 since upgrading from 3.10.0-957 to 
> 3.10.0-1062. I'm thinking about a java memory leak, but not sure.
> Do you know what's going on here ?

Well, what made you think "Java memory leak"?

Java is userspace.  If it's leaking memory so far that the kernel has problems, 
it would probably:

a) Have been leaking memory and causing problems in -957 as well
b) Died in the OOM (Out Of Memory) code, rather than in the futex() system call.

Yes, poorly written Java code will leak memory like a sieve, but this doesn't 
smell
anything remotely like a memory leak.

I agree with Valentin that it's probably the bug report he references.


pgpcgxYKmzvw3.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel Panic

2019-10-16 Thread Valdis Klētnieks
On Wed, 16 Oct 2019 07:34:01 -, Christophe DUMONT said:

> What made me think about a memory leak is the message : Java Not Tainted 
> 3.10.0-1062.1.1.el7.x86_64.=20

That just tells you that the currently executing process was java.

It says nothing at all about a memory leak, and as I already mentioned, if Java
was leaking memory, it would almost certainly have been leaking memory on a
previous kernel.

The important part almost always isn't the running process, it's the kernel
stack traceback, which in this case has 'futex' scribbled *all* over it.

General rule of thumb:

If you get more than one crash that has a similar traceback that points at a
specific syscall, or file system driver, etc, the bug is almost guaranteed to
be in that code.

If you get a rash of crashes with *different* tracebacks, you probably have
some other code that's overlaying memory.



pgpeh5sKO25rz.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Getting netlink socket creation returns NULL in the kernel

2019-10-16 Thread Valdis Klētnieks
On Wed, 16 Oct 2019 13:54:08 +0900, Irfan Ullah said:

> developed kernel space, and user space programs.  Kernel and user space
> programs working perfectly when I load and run these modules from the
> terminal using “sudo insmod kernelmodule.ko”, and “./userspaceApp”
> respectively.  But, when I try to use kernel program  (directly as a header
> file #include "kernelmodule.h") with the kernel original file that is
> “vmx.c” then it returns “NULL” while creating socket (nl_sk = NULL in 
> nl_sk

You're going to have to explain in more detail what you're doing, and possibly
share your code.  "when I try to use kernel program directly" doesn't make 
sense



pgpNfzng5C3VQ.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Setting up to debug kernel

2019-10-16 Thread Valdis Klētnieks
On Wed, 16 Oct 2019 18:14:10 -0700, Jerry DeLisle said:

> My first thought is to try to connect via a serial port or similarly a USB 
> based
> port and get some sort of continuous stream of log data that I can capture on
> another machine.

If you have a network, netconsole may be an option.  If you are using UEFI
boot, using pstore to save the panic message might work as well.

> I also suppose that I could use SSH to login remotely and likewise grab logs.

If your kernel is in fact locking up, SSH will probably not be an option.


pgp5DToZa5Oeq.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Try/catch for modules?

2019-10-18 Thread Valdis Klētnieks
On Fri, 18 Oct 2019 13:11:54 -0300, Martin Galvan said:

> I don't think I was clear. My intent is that if a pointer bug isn't
> fixed, my module will fail gracefully and go through the catch block
> instead of panicking the whole system. 

Well..here's the thing.  Unless you have "panic_on_oops" set, hitting a null
pointer will usually *NOT* panic the whole system. In fact, that # in the
panic message is a counter of how many times the kernel has OOPs'ed already.
Way back in the dark mists of time, I had a system that managed to get it up to
#1500 or so overnight.

The problem is that at that point,  a generic "fail gracefully" isn't really an 
option.

The most graceful generic thing the kernel can do at that point is kill the 
execution
thread that hit the error.  This can quickly go sideways if that thread held a 
lock
or similar critical resource.  And no, even though the kernel knows all the 
locks
the thread had, it *does not* know which ones, if any, are safe to unlock. The
answer is probably "none of them", because locks are usually around the smallest
amount of code possible, so if the lock was held, it's probably unsafe to break 
it. The
few places in the kernel that do lock-breaking are basically all things like 
the printk/sysrq
code that is basically a "we're dead anyhow, try to get some logging info".

I've seen systems that manage to get the load average up to 17,000 or so, 
because
process after process got into 'D' state because they tried to do filesystem 
I/O to
a filesystem that had a lock wedged when a process oopsed. (A good reason for
production systems to have lots of filesystems - if a kernel bug causes the 
/apps/database/logs
filesystem to hang, you can probably reboot and recover because 
/apps/database/replay
is synced to disk, and you have lots of stuff in /var that's got forensic info 
in it.  Use one
big filesystem, and when it locks up, you're immediately dead in the water.  
Might not
even be able to ssh in, because that hangs because it writes to /var which is 
wedged
along with everything else)

And if you actually *think* about it - a 'try/catch' is semantically 
*identical* to
coding a parameter test before the event or checking a return code after.

(A good place to interject Tom Duff's First Law of Systems Programming - "Never 
test
for an error condition you don't know how to handle".  Note in this context, 
"kill the thread
and pray" means you *do* know how to handle it - by killing the thread...)

Also - say you have a try/catch around a statement.  For some exceptions, such
as an end-of-file or a dropped network connection, it's reasonably easy to know
how to clean up and continue. But what if the statement hits a null pointer
error.   What do you do to clean things up?   You have a bad pointer, and you
have *no way to actually fix it and continue normally*.

And don't get me started on try/catch/throw - that's got even *more* land 
mines. :)



pgpq2qZEqP47c.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Try/catch for modules?

2019-10-18 Thread Valdis Klētnieks
On Fri, 18 Oct 2019 12:43:58 -0300, Martin Galvan said:

> goto statements are harmful.

For starters, note that the original paper was written in 1968. Yes, it's over
a half century old now.

Have you actually *read* the paper?

https://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf

You probably want to keep in mind that in 1968, most languages in widespread
use (in other words, Fortran and COBOL, effectively) did *not* have any concept
*at all* of "structured programming" supported by language syntax. 

Fortran had exactly one explicit iterative syntax, the DO loop, which was a
subset of the C for loop.

It also had the infamous and dangerous 'arithmetic if' statement, which was of
the format 'if (condition) A,B,C' - which was a *three* way goto to labels A B
or C depending if 'condition' was negative, zero, or positive.  Yes, this meant
that the following statement *had* to have a goto target label on it in order
to be reachable code.

There was also 'GOTO A,B,C,...,integer variable', which would branch to
label A if the variable was 1, B if it was 2, etc...

And  the creeping horror known as 'assigned goto'.

  ASSIGN 10 TO N 
   ... 
   GO TO N ( 10, 20, 30, 40 ) 
   ... 
10 CONTINUE 
   ... 
40 STOP

Now imagine a large subroutine that takes most of a box of punched cards, and
5 or 6 different variables containing assigned targets, and the GOTO can branch
to *anywhere* in the current compilation unit.

(If that isn't bad enough, the IBM Fortran IV G compiler included an extension
called 'debug units', which *really* *did* have the semantics of the joke 
statement
'COME FROM'.)

Fortran wouldn't get a 'while' for another 15 years or so.  And the COBOL 
'PERFORM'
verb is best not approached without first  fortifying yourself with strong 
drink.

That was the reality that Dijkstra was writing about at the time.

Also consider where Dijkstra's head was when he wrote the paper, which also
includes this statement:

"Let us now consider repetition clauses (like, while B repeat A or repeat A
until B). Logically speaking, such clauses are now superfluous, bcause we can
express repetition with the aid of recursive procedures."

Yeah.  Exactly.

Let's return to 2019.

Feel free to look at function hi_command() in 
drivers/media/dvb-frontends/drx39xyj/drxj.c
and re-write it in standard nested if/then/elseif/then/elsif/then... form.
Watch out for the fact that the case statement uses fallthroughs.  Let us know
the maximum number of leading tabs your version uses.

Another good challenge is to unsnarl the gotos in the TLV_PUT macros in
fs/btrfs/send.c into a form that doesn't use goto.

There's plenty of good reason to use goto's when implementing a
finite state machine.

At some point, purity needs to take a back seat to pragmatism.



pgpDhxALgDXZp.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Can't seem to find a maintainer for init/* files

2019-10-18 Thread Valdis Klētnieks
On Sat, 19 Oct 2019 10:33:00 +1300, Paulo Almeida said:

> 1 - This specific code block has been around for quite some time and many
> additions using the correct printk(KERN_* were made after it was written.
> Does that mean that this code block is an exception and should be left
> as-is for some technical reason? Or, people have somehow forgotten about it
> and I finally found something to do? :)

There's a meta-consideration or two here to think about.

First, many maintainers are not thrilled with trivial patches to code,
especially checkpatch cleanups.  That's because those patches fall into two
major categories:

The patch is against code that's debugged and rock solid stable.  Most of
do_mounts.c is close to a decade old, and it's only being changed when it's
needed to add an actual feature (such as mounting by partition label in 2018
or mounting a CIFS filesystem this year).  And we *have* had what looked like
"trivial checkpatch cleanup" patches that were buggy and broke stuff.

The other category is "patches against code that's being worked on".  If it's
something that somebody else is working on, it can cause merge conflicts, which
make maintainers grumpy.  So the maintainer only wants to see those cleanups if
they're by the person who's working on the code, at the front of the patch
series, so that (presumably) they don't have merge commits and they've gotten
some compile and run testing.

The other big consideration is git.  Yes, git knows where and when every single
line of code came from.  That doesn't mean it's always easy to get it to cough
up information.

For example:   'git blame init/do_mounts.c'.  That tells you where each line 
came from.
Now... imagine a commit that did a spaces-to-tabs cleanup on lines 249 to 257.
git blame' now lists the cleanup commit, not the 6 commits that added the 
original code.

Exercise for the reader:  Determine the easiest way to get 'git blame' to show 
you
the original 6 commits rather than the cleanup.


pgptKjwtCKMRP.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Try/catch for modules?

2019-10-17 Thread Valdis Klētnieks
On Thu, 17 Oct 2019 10:37:09 -0300, Martin Galvan said:
> module does e.g. a NULL dereference. The (horribly hackish) way I'm
> doing this right now is registering a die_notifier which will set the
> 'panic_on_oops' variable to 0 if we detect that the current PID
> corresponds to my module. However, this is ugly for many reasons.

For starters, the *correct* in-kernel way to deal with this is:
if (!ptr) {
printk("You blew it!\n");
goto you_blew_it;
}

Also, "current PID" and "my module" aren't two things that can correspond

For double bonus points - this sort of "ignore the error if it's my process" 
means
that any other user can trigger the situation - and crash the system.


pgprGx375JgMi.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Netlink socket returns NULL in vmx.c kernel file

2019-10-22 Thread Valdis Klētnieks
On Wed, 23 Oct 2019 11:43:22 +0900, Irfan Ullah said:

> Can you please what's the problem?

To be brutally honest, your code is too unreadable to figure out
what the problem is.

First off, fix your code formatting to match Linux kernel indenting.

Second, there's a lot of code in your .h file that should be in a .c file.

Third, functions should be static when feasible.


pgpcaZBDZUEB4.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Just started w/Linux Kernel (Beginner)

2019-11-24 Thread Valdis Klētnieks
On Sun, 24 Nov 2019 11:58:24 +, Benjamin Selormey said:
> Hello,
>
> I’m a newbie with Linux kernel and I l want to contribute in security 
> research of  the Linux Kernel.

A newbie? Go and read 
https://lists.kernelnewbies.org/pipermail/kernelnewbies/2017-April/017765.html
and Documentation/process/submitting-patches.rst in your git tree.

You *do* have a git tree of some appropriate kernel, right? If not, fix that 
deficiency. :)

> I am interested in memory management and devices communication with the 
> kernel. Does anyone have a starter project in mind I can start with?

Hmm.  Security and memory management? The obvious place to start is to go and
look at all the since-patched cases of vma splits and merges abused for
exploits. Google for 'vma bug linux'.  Read, understand, and look for other
similar issues. Note that you'll probably need to understand in sufficient
depth that you can write at least a PoC (proof of concept) exploit that
demonstrates the problem.

Note that you may have trouble finding anything, most of the obvious cases got
pointed out by Solar Designer and Brad Spengler a decade or more ago.



pgpLhsv4ZnOpe.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to printk synchronously

2019-11-27 Thread Valdis Klētnieks
On Tue, 26 Nov 2019 22:31:08 +, Lucas Tanure said:
> Hi,
>
> What about ftrace ? Documentation/trace/ftrace.txt

That won't help - his ^@^@^@ is a result of the system stopping and no longer
writing to disk, so his logfile has blocks allocated to it but not yet written
to.

Using ftrace will have the same problem, if his kernel is locking up and not
syncing to disk.

A better approach would be to use netconsole to send all the console output to
another machine, or serial console if that's an option.

Though I have to wonder how he's determining it's a deadlock issue rather than
a panic that's just plain stopping the system - building the kernel with
lockdep support should reveal the issue before a lockup. And if it *is* a
deadlock, there should be sufficient info in the watchdog logging (assuming the
hardware has some sort of watchdog support, as most systems do).



pgpkiLATPX39R.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel TLS

2019-11-29 Thread Valdis Klētnieks
On Fri, 29 Nov 2019 23:37:35 -0500, Jeffrey Walton said:

repl: bad addresses:
Valentin VidiD  -- no mailbox in 
address, only a phrase (Valentin VidiD)
> On Fri, Nov 29, 2019 at 3:04 PM Jeffrey Walton  wrote:
> > ...
> > So now I am at:
> >
> > $ gcc -Wall -g2 -O1 ktls.c -o ktls
> > $ ./ktls
> > setsockopt failed, 524, Unknown error 524
>
> Now open in the Fedora bug tracker:
> https://bugzilla.redhat.com/show_bug.cgi?id=1778348

Looks like the 'unknown error' issue is a glibc strerror() problem. On the
kernel side, git blame says:

 [/usr/src/linux-next] git blame include/linux/errno.h | grep -C 5 524
^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 22)
^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 23) /* Defined for 
the NFSv3 protocol */
^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 24) #define 
EBADHANDLE  521 /* Illegal NFS file handle */
^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 25) #define 
ENOTSYNC522 /* Update synchronization mismatch */
^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 26) #define 
EBADCOOKIE  523 /* Cookie is stale */
^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 27) #define 
ENOTSUPP524 /* Operation is not supported */
^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 28) #define 
ETOOSMALL   525 /* Buffer or request is too small */
^1da177e4c3f4 (Linus Torvalds 2005-04-16 15:20:36 -0700 29) #define 
ESERVERFAULT526 /* An untranslatable error occurred */

So I'm mystified why glibc's strerror() doesn't handle it.
Though I think Alexander is correct on why the kernel returns ENOTSUPP.

I've updated the bugzilla entry.




pgpY3DYjUEjuv.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel TLS

2019-11-30 Thread Valdis Klētnieks
On Sat, 30 Nov 2019 09:13:35 +0100, Bj�rn Mork said:

> include/linux/errno.h is kernel internal only.  The UAPI header is
> uapi/linux/errno.h, which is an alias for uapi/asm/errno.h.  There is no
> 524 in include/uapi/asm-generic/errno.h or
> include/uapi/asm-generic/errno-base.h
>
> The codes in include/linux/errno.h should be translated for userspace.
> This does look like a bug in the kernel tls code.

Hmm... git grep ENOTSUPP has 1,516 hits.  I haven't checked if it
gets translated in one place, or if it gets done in a kazillion places.


pgpHnx7Lz9Aus.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel TLS

2019-11-30 Thread Valdis Klētnieks
On Sat, 30 Nov 2019 11:10:50 +0100, Bj�rn Mork said:

> My version of setsockopt(2) says
(...)
> ERRORS
>EBADF The  argument sockfd is not a valid file descrip‐
>  tor.
>

Note that there is no general *guarantee* that a syscall cannot
return any values other than the ones in the manpage.

> If you look at e.g. udp_lib_setsockopt() you'll see that it conforms
> strictly to this.  I don't know why do_tcp_setsockopt() doesn't.

Probably because those are the only errors that the UDP version
can hit, but the TCP case can hit cases like "socket must be in
a connected state" and possibly other error codes.  Now, I admit
wondering why it uses ENOTSUPP rather than ENOTCONN for this
particular case.


pgpVVtnSiJFe8.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Just joined

2019-11-19 Thread Valdis Klētnieks
On Tue, 19 Nov 2019 17:03:37 +0530, Akash Sarda said:
> Hi,
>
> My name is Akash, and I want to start with OS development..
> I am interested in memory management, and would like to know if anyone
> has a newbie project in their mind..

Unfortunately, there's probably not much good newbie work in memory management,
because a whole lot of experts have already gone over it and make it work
reasonably well on *literally* everything from light bulbs to supercomputers.

I'm not saying there's nothing in there for a newbie to do. There's probably
still tons of minor enhancements that can be done, but they're going to require
that you actually understand the code at a fairly deep level. For example,
here's a recent commit:

commit abc04c84ae77fdbce2c42c52e4059d327e54c7ab
Author: Minchan Kim 
Date:   Wed Nov 6 16:06:48 2019 +1100

mm/page_io.c: annotate refault stalls from swap_readpage

If a block device supports rw_page operation, it doesn't submit bios so
the annotation in submit_bio() for refault stall doesn't work.  It happens
with zram in android, especially swap read path which could consume CPU
cycle for decompress.  It is also a problem for zswap which uses
frontswap.

Annotate swap_readpage() to account the synchronous IO overhead to prevent
underreport memory pressure.

The description of what was changed and why runs to just under 500 characters,
while the actual change is well under 200.

I'm assuming you've already cloned either Linus's git tree or one or more of
the development trees.   If so, you can do a 'git log mm/' and see what work
has been recently done, so you know what sort of learning curve you're
going to have.

You definitely need to read the various files under Documentation/process
and you probably should go read this as well:

https://lists.kernelnewbies.org/pipermail/kernelnewbies/2017-April/017765.html


pgpSt1398nODe.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Generating Log of Guest Physical Addresses from a Kernel Function and Perform Analysis at Runtime

2019-09-25 Thread Valdis Klētnieks
On Wed, 25 Sep 2019 11:38:24 +0200, Greg KH said:
> Try using ftrace and tracing in general first, before messing around
> with netlink.  ftrace does not require any kernel changes at all, why
> would you _not_ want to try that?  :)

Beginning programmers write complicated messy code.
Good programmers write clean readable code.
Great programmers avoid writing code in the first place.

Something that many beginners don't seem to have a handle on. :)



pgpOH6D8xoXJk.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Software Prefetching using Machine learning

2019-10-09 Thread Valdis Klētnieks
On Thu, 10 Oct 2019 11:48:11 +0900, Irfan Ullah said:

> it can it be possible to share any type of data structure (e.g., structure,
> array, etc.) between  kernel space C program and user space Python program
> in two way communication.

Python's split() is probably the wrong tool here. You probably are looking for
something like this:

https://pymotw.com/2/struct/ - the guys at Python Module Of The Week have you 
covered.

And a more detailed write-up:
http://archive.oreilly.com/oreillyschool/courses/Python3/Python3-08.html


pgpMPgndAAviH.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Kernel development tools (was Re: Software Prefetching using Machine learning)

2019-10-09 Thread Valdis Klētnieks
On Thu, 10 Oct 2019 11:48:11 +0900, Irfan Ullah said:
> @All,* There is one thing I want to share, although it is not too relevant
> but worth to share,*  that very limited number of *easy-to-use-&-understand*
> tools and libraries available to welcome  and facilitate the
> newbies/freshmen in the kernel development as compare to other development
> environments.

Well... for better or worse, the Linux kernel is an environment where
programmers are expected to have a fairly good grasp on programming and
software development already, and can figure most things out on their own.

Having said that, if you have specific suggestions of tools and libraries that
would make a difference, feel free to state what you think is missing - there's
a good chance that it actually exists but you didn't know about it



pgpCKDvetgjeA.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Software Prefetching using Machine learning

2019-10-09 Thread Valdis Klētnieks
On Wed, 09 Oct 2019 12:37:51 +0900, Irfan Ullah said:

> Thanks in advance. I am a PhD candidate, and currently I have started
> working on kernel development. My professor told me to implement this paper
> . In this paper authors have used machine
> learning to predict the next missed addresses.

>From the abstract:

On a suite of challenging benchmark datasets, we find that neural networks
consistently demonstrate superior performance in terms of precision and recall

Delving further into the paper, we discover that the researchers have learned 
that
if you run Spec CPU2006 enough times, a neural network can learn what memory 
access
patterns Spec CPU2006 exhibits.

But they don't demonstrate that the patterns learned transfer to any other 
programs.
And nobody sane runs the exact same program with the exact same inputs 
repeatedly
unless they're doing benchmarking.

Ah, academia - where novelty of an idea is sufficient to get published, and 
considerations
of whether it's a *useful* idea are totally disregarded.

> 1) How can I directly store the missed addresses, and instruction addresses
> from kernel handle_mm_fault() to a file?

Don't do that.  Pass the data to userspace via netlink or debugfs or shared
memory or other means, and have userspace handle it.

> 2) How can I use machine learning classifier in the kernel for predicting 
> addresses?� ��

Well... in general, you won't be able to do much actually *useful*, because of
time scales.  If your predictor says "Oh, program XYZ will need page 1AB83D 20
milliseconds from now", but it takes 10 milliseconds to bring a page in, your
predictor has only 10 milliseconds to make the prediction in order to be
useful.

And in fact, you probably have even less, because your predictor has to be fast
enough and use little enough memory that it doesn't significantly affect CPU,
cache, or RAM usage.

> 3) Is there any way to do the machine learning in the user space in python, 
> and
> then transfer� the classifier in bytes forms to the kernel space for address
> predictions ?

Sure, there's plenty of ways, from using shared memory to creating an ioctl().

But all of them are going to have the same "you need to do it in less time than
it takes for the program you're predicting to reach the point for the 
prediction".

Good luck, you will need it.


pgpfz2IZq8lRh.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Predicting Process crash / Memory utlization using machine learning

2019-10-09 Thread Valdis Klētnieks
On Wed, 09 Oct 2019 01:23:28 -0700, prathamesh naik said:
> I want to work on project which can predict kernel process
> crash or even user space process crash (or memory usage spikes) using
> machine learning algorithms. 

This sounds like it's isomorphic to the Turing Halting Problem, and there's
plenty of other good reasons to think that predicting a process crash is, in
general, somewhere between "very difficult" and "impossible".

Even "memory usage spikes" are going to be a challenge.

Consider a program that's doing an in-memory sort. Your machine has 16 gig of
memory, and 2 gig of swap.  It's known that the sort algorithm requires 1.5G of
memory for each gigabyte of input data.

Does the system start to thrash, or crash entirely, or does the sort complete
without issues?  There's no way to make a prediction without knowing the size
of the input data.  And if you're dealing with something like 

grep  file | predictable-memory-sort

where 'file' is a logfile *much* bigger than memory

You can see where this is heading...

Bottom line:  I'm pretty convinced that in the general case, you can't do much
better than current monitoring systems already do: Look at free space, look at
the free space trendline for the past 5 minutes or whatever, and issue an alert
if the current trend indicates exhaustion in under 15 minutes.

Now, what *might* be interesting is seeing if machine learning across multiple
events is able to suggest better values than 5 and 15 minutes, to provide a
best tradeoff between issuing an alert early enough that a sysadmin can take
action, and avoiding issuing early alerts that turn out to be false alarms.

The problem there is that getting enough data on actual production systems
will be difficult, because sysadmins usually don't leave sub-optimal 
configuration
settings in place so you can gather data.

And data gathered for machine learning on an intentionally misconfigured test
system won't be applicable to other machines.

Good luck, this problem is a lot harder than it looks


pgpTPKcfvchwa.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Hello, does anyone know any university that has lines of research on the linux kernel

2019-10-03 Thread Valdis Klētnieks
On Thu, 03 Oct 2019 16:09:28 -0600, Manuel Quintero Fonseca said:
> Thank you all for answering, my question is, if I want to do a phD
> that has to do with the Linux kernel, it will be more difficult to be
> accepted at a university, where they do not have lines of research on
> Linux, those where they openly have research on The Linux kernel

Getting accepted into a university's PhD program, and getting a thesis
project approved, are two very different things.

And you still have to figure out if you wanted to do research specifically
on the Linux kernel itself, or your topic of interest is something else and
Linux is just a useful tool.

For example:  Using Linux as a basis for testing new ideas in virtual memory
management is different from a project studying the Linux memory manager.
And since a PhD thesis is supposed to be new original research, you'll have
a lot easier time selling a "new ideas in virtual memory management" thesis
than "studying existing code" thesis

So what is it *exactly* that you wanted to do a thesis on?


pgpeua3vqU5k2.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Test the kernel code

2019-10-04 Thread Valdis Klētnieks
On Fri, 04 Oct 2019 22:07:21 -0500, CRISTIAN ANDRES VARGAS GONZALEZ said:

> I am starting to develop the kernel and I have been compiling the kernel,
> now it has been compiling for some time now. When a patch is added, should
> everything be compiled again? Or is there a different way to test the code
> that has been written?

The kernel build is driven by 'make', which is a dependency-driven program that
only rebuilds things which have changed dependencies.  How much actually gets
rebuilt depends on what exactly the patch changes.

It changes one .c file, it probably won't rebuild anything else.  If the patch
touches a major .h file that's included in a lot of things, both direct *and*
indirectly from other .h files, you will probably be looking at a long rebuild
as every .c file that includes the affected .h file gets recompiled.

One crucial point to keep in mind - make is *not* smart enough to understand
that foo.c references 3 structures defined in bar.h - and that the patch
touches some other structure in bar.h that isn't used in foo.c.  All it knows
is that foo.c #includes bar.h, and bar.h was modified (via checking the
timestamps), and thus a rebuild of foo.o is probably called for. If any of the
dependencies (usually the included .h files, but other dependencies can be
specified in the Makefile) has a newer last-modified timestamp than foo.c,
foo.c is getting rebuilt.

And then there's some changes that will end up forcing a rebuild of pretty much
everything in sight (for instance, anything that touches the top-level
Makefile, or certain other similar crucial files).

If you're not familiar with 'make', it's probably time you learned... :)



pgpQuH8x2aAyG.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Remote I/O bus

2019-10-05 Thread Valdis Klētnieks
On Sun, 06 Oct 2019 00:29:18 +0200, Luca Ceresoli said:

> BTW I guess having an FPGA external to the SoC connected via SPI or I2C
> is not uncommon. Am I wrong?

Look at it this way - as a practical matter, if you have an FPGA, it's probably
going to be hanging off an SPI, I2C, or PCI.  And if you're an SoC, especially
at the low end, PCI may be too much silicon to bother with.

Oddly enough, I've not seen any FPGA over USB.  That of course doesn't mean
that some maniac hasn't tried to do it :)



pgpy4C0bc1VWU.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Do I need strong mathematical bases to work in the memory subsystem?

2019-10-02 Thread Valdis Klētnieks
On Wed, 02 Oct 2019 21:47:42 -0400, Ruben Safir said:

> I've heard this for years and when I went back for my PhD and Masters
> degree in comp sci, I found out, low and behold, this is just not true.

The question was specific to *kernel* development.

Look around.  Does Linus have a PhD?  How many people at the last Kernel 
Plumber's
or Kernel Summit have PhDs?

I'm willing to bet that there's very few PhD's in CS listed in MAINTAINERS.  And
those that are, are probably coincidental...

> If you hope to do anything that is not elementry, you need serious math
> for the algorithms, not to mention to complete the jobs being done.
>
> Knowing math is the real key to unlocking to potential of the power of
> computational mathmatics.

If you're doing that sort of mathematics *inside the kernel*, there's probably 
something
wrong with your overall design.

Just sayin'.


pgpIzl_uQTCMN.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Do I need strong mathematical bases to work in the memory subsystem?

2019-10-03 Thread Valdis Klētnieks
On Thu, 03 Oct 2019 06:55:50 -0400, Ruben Safir said:

> I wouldn't call that C code basic.  Regardless, showing an example of a
> driver that doesn't need math, and it might if you understood the high
> level math, and your not aware of it, but predictive branching would
> need it.  

See the kernel code that maintains statistical data on likely()/unlikely()
under CONFIG_PROFILE_ANNOTATED_BRANCHES. Seems like "this likely() actually
only triggers 3% of the time" isn't exactly higher math.

There may be some magic going on in the chip hardware - but that's in the
*hardware* and inaccessible to the programmer.  I'll also point out that
speculative execution has *other* problems.

> You can not calculate simple interest efficiently without calculus. 

Simple interest is *easy*.  Amount * percent.  Done.  It's compound interest
that only sort of needs calculus (and there only to understand the limiting
case) - and even there I doubt any banks actually use calculus, just apply the
iterative approach.

//  yearly interest compounded monthly
for (i=0;i calculus.  This repeadely ends up being an issue of "if I don't know it,
> I don't need it", which is wrong.  More math helps you every time.  Math

Somehow I doubt that the Taniyama-Shimura-Weil conjecture is ever
going to have any relevance inside the kernel.

> is advanced logic.  I can't tell you how many times I see folks brute
> force their way to solutions that they should be using integration.

Can you show an example of where the kernel needs to be using integration?


pgpTf0K3G7M80.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Remote I/O bus

2019-10-04 Thread Valdis Klētnieks
On Fri, 04 Oct 2019 17:08:30 +0200, Luca Ceresoli said:
> Yes, the read/write helpers are nicely isolated. However this sits in a
> vendor kernel that tends to change a lot from one release to another, so

I admit having a hard time wrapping my head around "vendor kernel that
changes a lot from one release to another", unless you mean something like
the RedHat Enterprise releases that updates every 2 years, and at that point 
you get hit
with a jump of 8 or 10 kernel releases.

And of course, the right answer is to fix up the driver and upstream it, so that
in 2022 when your vendor does a new release, the updated driver will already be
there waiting for you.

And don't worry about having to do patches to update the driver to a new kernel
release because APIs change - that's only a problem for out-of-tree drivers.  
If it's
in-tree, the person making the API change is supposed to fix your driver for 
you.



pgpdrGLcTVjyk.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Need help to get started

2020-02-24 Thread Valdis Klētnieks
On Tue, 25 Feb 2020 12:08:53 +0530, m vivek said:

> I just joined the mailing list. Excited to contribute to linux kernel but
> finding it difficult to figure out where to start.
>
> Any help would be appreciated.

Pretty much anything under drivers/staging is fair game - it's *all* code
that needs fixing (if it didn't need fixing to be up to most kernel standards,
it wouldn't be in drivers/staging :)

Also, you probably want to go read this:

https://lists.kernelnewbies.org/pipermail/kernelnewbies/2017-April/017765.html


pgp6CK7fYZjLO.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Ordering guarantee inside a single bio?

2020-01-27 Thread Valdis Klētnieks
On Sun, 26 Jan 2020 13:07:38 +0100, Lukas Straub said:

> I am planing to write a new device-mapper target and I'm wondering if there
> is a ordering guarantee for the operation inside a single bio? For example if 
> I
> issue a write bio to sector 0 of length 4, is it guaranteed that sector 0 is
> written first and sector 3 is written last?

I'll bite.  What are you doing where the order of writing out a single bio 
matters?


pgpYhFzint8zN.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Ordering guarantee inside a single bio?

2020-01-29 Thread Valdis Klētnieks
On Tue, 28 Jan 2020 13:50:56 +0900, 오준택 said:

(Lukas - there's stuff for you further down...)

> If you write checksum for some data, ordering between checksum and data is
> not needed.

Actually, it is.

> When the crash occurs, we just recalculate checksum with data and compare
> the recalculated one with a written one.

And it's required because the read of the data that gets a checksum-data 
mismatch
may be weeks, months, or even years after a crash happens.  You don't have any
history to go on, *only* on the data as found and the two checksums.

You can't safely just recalculate the checksum, because that's the whole *point*
of the checksum - to detect that something has gone wrong.   And if it's the 
data
that has gone wrong, just recalculating the checksum is the exact wrong thing
to do.

Failing the read with a -EIO, and not touching the data or checksums is the 
proper thing to do.

> Even though checksum is written first, the recalculated checksum will be
> different with the written checksum because data is not written.

You missed an important point.  If you read the block and the checksum and they
don't match, you don't know if the checksum is wrong because it's stale, or if
the data has been corrupted.

That's part of why there's 2 checksums, one before and one after the data block.
That way, if the two checksums match each other but not the data, you know that
something has corrupted the data.  If the two checksums don't match, it gets 
more
interesting:

If the first one matches the data and the second doesn't, then either the second
one has gotten corrupted, or the system died between writing the data and the
second checksum.  But that's OK, because the first checksum says the data update
did succeed, so simply patching the second checksum is OK.

If the first one doesn't match and the second one *does*, then either the 
system died
between the first update and the data, or the first one is corrupted - and you 
don't
have a good way to distinguish between them unless you have timestamps.

If neither checksum matches the data, then you're pretty sure the system died
between the first checksum and finishing the data write.

Questions for Lukas:

First off, see my comment about -EIO.  Do you have plans for an ioctl or
other way for userspace to get the two checksums so diagnostic programs
can do better error diagnosis/recovery?

If I understand what you're doing, each 4096 (or whatever) block will actually
take (4096 + 2* checksum size) bytes, which means each logical consecutive
block will be offset from the start of a physical block by some amount.   This
effectively means that you are guaranteed one read-modify-write and possibly
two, for each write. (The other alternative is to devote an entire block to
each checksum, but that triples the size and at that point you may as well just
do a 2+1 raidset)

Even if your hardware is willing to do the RMW cycle in hardware, that still
hits you for at least one rotational latency, and possibly two.  If you have to
do the RMW in software, it gets a *lot* more painful (and actually *ensuring*
atomic writes gets more challenging).   At that point, are you still gaining
performance over the current dm-integrity scheme?

(There's also a lot more ugly that happens on high-end storage devices, where
your logical device is actually a 8+2 RAID6 LUN striped across 10 volumes - 
even a single
4K write is guaranteed to be a RMW, and you need to do a 32K write to make it
really be a write.

IBM's GPFS, SGI's CXFS, and probably other high-end file systems as well, go
another level of crazy in order to get high performace - you end up striping
the filesystem across 4 or 8 LUNs, so you want a logical blocksize that gets
you 4 or 8 times the 32K that each LUN wants to see.

At which point the storage admin is ready to shoot the end user who writes a
program that does 1K writes, causing your throughput to fall through the
floor.. Been there, done that, it gets ugly quickly... :)



pgpWdls6TeaCH.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Understanding the working of Optimistic DAD Feature.

2020-02-06 Thread Valdis Klētnieks
On Thu, 06 Feb 2020 12:19:26 +, Chinmay Agarwal said:

> To check the same there is a condition in kernel code wherein we check if 
> ipv6.devconf_all is set.
> Now, my query is that we are checking if forwarding is enabled on all 
> interfaces, then we consider the system to be a router.

> But even if forwarding is enabled from few interfaces(not all) isn't the 
> system behaving like a router?

You can't actually configure "routing on some but not all" without setting the
global forwarding bit.

If you have the very odd use case where you have eth0, eth1, and eth2, and
you're routing between eth0 and eth1, but eth2 is a private net that should
*not* communicate with either eth0 or eth1, the way you configure that is to
turn on the global forwarding bit, and then use a combo of routing table and
firewall rules to prevent traffic going to eth2 unless it's from the local
host.





pgp1CUBSQCxR8.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Ordering guarantee inside a single bio?

2020-01-30 Thread Valdis Klētnieks
On Thu, 30 Jan 2020 15:16:17 +0100, Lukas Straub said:

> No, csum_next (and csum_prev) is a whole sector (i.e. physical block)
> containing all the checksums for the following chunk of data (which
> spans multiple sectors)

Oh, OK.  That works too.. :)

> Regarding the ordering guarantee, I have now gathered that the kernel
> will happily split the bio if the size is not optimal for the hardware
> which means it's not guaranteed - right?

And if the kernel doesn't split it and reorder the chunks, the hardware will
happily do so - and lie to you the whole way.  Ever since the hardware people
realized they could get away with lying about turning off the writeback cache,
it's been more and more of a challenge to guarantee correct performance from
storage subsystems.



pgpDkOz_C1Bbo.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel drivers and IOCTLs

2020-01-23 Thread Valdis Klētnieks
On Tue, 21 Jan 2020 22:27:01 -0600, WyoFlippa said:

> I'm working on a driver that would verify a Linux or U-Boot image is
> secure and I need to pass parameters such as the public key, starting
> address, etc.

This is actually a lot harder to do properly than it looks, especially if
you're trying to export the information to userspace - a compromised kernel can
simply hijack your ioctl or /proc or /sys file and output that it's not
compromised. You can't even easily use public/private keys to sign the
statement it's not compromised, because if the legit kernel has access to the
public key, the compromised code probably does too.

And if you're defending against sufficiently well-financed attackers, it may
even be difficult for a driver to verify the rest of the kernel isn't
compromised. As a fairly obvious attack, consider a kernel with 2 sets of page
table mappings. First, a set that contains the original kernel code and is
mapped in when your driver is executing, and then the *real* set that maps in
other physical pages containing the skullduggery code, which gets mapped in
when there's something evil being done

So what *actual* problem are you trying to solve by using a driver to verify
the image is "secure" (which needs further definition, but you probably already
knew that if your skill level is up to doing this right...)?  In particular, 
what are
you trying to do that various secure boot schemes don't address?



pgpaMTHX6fInl.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel drivers and IOCTLs

2020-02-04 Thread Valdis Klētnieks
On Tue, 04 Feb 2020 20:57:24 -0600, WyoFlippa said:

> I'm actually happy with the existing boot schemes. In this case, the
> driver is going to validate a signed image (U-Boot or Linux) before it
> is programmed into the flash memory. Although the image is validated
> when booting, it is one additional check to avoid surprises.

Is there a reason you're trying to do it from a driver rather than from 
userspace?

Under what realistic conditions will the kernel be trustable to do the 
validation
while userspace isn't? What's the threat model here - in other words, what
attack(s) are you trying to stop?  (This is a lot trickier than it looks - over 
the
decades, I've seen plenty of "Let's do this cargo-cult thing to stop attack X",
while overlooking the fact that any attacker who can do X can equally easily
do Y and still pwn the entire box.)



pgpGOh5Z0jE_o.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: New text

2020-02-20 Thread Valdis Klētnieks
On Wed, 19 Feb 2020 18:17:36 -0500, Ruben Safir said:

> > 1) Start wading through the git log until you find the commit that
> > changed the API. In either that commit, or a commit in the same series,
> > whoever changed the API
>
> I don't think that will be a useful way to learn to code the kernel.

That was addressing the specific case of "I need to update an out of tree
driver to a recent kernel".

And I didn't say the kernel is impossible to learn - only that there's no
special consideration given in-tree to beginning kernel hackers.  If you don't
already understand topics like locking and caching and file systems, you need
to learn those elsewhere.



pgpu2GmW6unIH.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: uxfs review

2020-02-20 Thread Valdis Klētnieks
On Thu, 20 Feb 2020 18:39:21 +0100, Valentin Vidic said:

> Now that you mention 2.6 :), I've been trying to update this uxfs
> filesystem from an old book[1] for modern kernels:
>
>   https://github.com/vvidic/uxfs/
>
> It currently builds and seems to work but it could probably use a
> review if someone has a bit of time?  I can post the diff here if
> that would be easier.

Do you have an actual uxfs filesystem to test against?  Having the
mkfs.uxfs and fsck.uxfs utilities would be useful as well.



pgpokO3TVLKn0.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: New text

2020-02-19 Thread Valdis Klētnieks
On Wed, 19 Feb 2020 10:54:10 -0500, Ruben Safir said:
> is there currently a rfecommended text to learn kernel development from?

The first thing to remember is that the kernel has no obligation to be easy for
new programmers. It's more like a Formula 1 race car than a Ford Escort -
there's an expectation that you *really* know what you're doing when you put
your hands on the steering wheel.

As a result, the authoritative source on that is, of course:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Seriously - the kernel is a work in progress, and it's in motion all the time.
There's no way to write a text that's up to date - by the time you finish, the
source tree has moved by 3 or 4 million lines.

Currently, the best is probably Linux Device Drivers, Third Edition (just
google for LDD3 and you'll get a pointer to the PDF).

However - keep in mind that it discusses *concepts* like locking and variable
lifetimes and so on.  It's now somewhere over 15 million lines out of date, and
it's expected that if you can't figure out for yourself what concepts mean, or
what the *current* API is, then you probably shouldn't be writing large chunks
of kernel code that requires undertanding the current APIs.

Note that many bugs are fixable without knowing all that - it's often a
"somebody added A and B, when it should have been A - B" or a stupid null
pointer mistake, or similar.  And drivers/staging is just *full* of stuff that
is in drastic need of help.

Protip: if you've been handed a device dqriver for a 4.7 kernel, and been told 
to make
it work under a 5.5 kernel, and you have *no* idea how to fix all the compile 
errors,
the way to proceed is usually:

0) Kernel API changes are *supposed* to break a compile - if you used to pass
an integer as the third parameter, and now it's a pointer to a structure, the
build will break.  Anybody who makes a kernel API change where the third
parameter used to be an integer that meant one thing, and changes it to an
integer that means something else, so it will compile but not work correctly,
will probably get smacked around with a large trout or something. (A common
work around for this is changing a function from (struct *A, struct *B, oldint)
to (struct A*, newint, struct *B) so the compiler will whine).

1) Start wading through the git log until you find the commit that changed the
API. In either that commit, or a commit in the same series, whoever changed the
API also changed all the in-tree users of the API.

2) Look at that commit, and see what got changed in all the in-tree users.

3) Do the same thing to your out-of-tree code.

And here you thought it was difficult, :)

(Of course, if you're trying to get a 2.6.12 driver working on 5.6... you may 
decide
that it's time to start drinking heavily :)


pgp1_eLcGcEcW.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: .config

2020-02-21 Thread Valdis Klētnieks
On Fri, 21 Feb 2020 23:41:10 +, Enzo Desiage said:

> modules I actually need (thanks find_all_modules.sh). I'm trying to devise
> a strategy to take the output of that script and make a brand new minimal
> .config file. However, the only solution I've found is doing it manually
> via make menuconfig. This seems to be a rather long process, which I
> wouldn't mind doing if there aren't any other alternatives.
> Is there any other way?

Step 1:  Plug in all your USB widgets and storage and other hot-pluggables
long enough for the kernel to load each device's drivers.

Step 2: 'make localmodconfig'.  This will create a .config that only has those
modules that were actually loaded at the time (there's also an option to take
the output of an 'lsmod', in case you're doing the builds on a different 
machine)


pgpW1tjeWXnaI.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel debugging

2020-02-21 Thread Valdis Klētnieks
On Wed, 19 Feb 2020 06:19:10 +, "Pankaj Vinadrao Joshi" said:

> I am using linux 5.4.3 with our custom Yocto distro on RISC v machine i want
> to get kernel crash log(hard panic) since RISC v does mot have support for the
> kxec how i can collect the crash logs?

Is netconsole an option, if ethernet is available?  If it has a serial port, use
another system as a serial console?  (I admit not knowing how well "console
on a USB serial port" works - if it does, that may be an option)


pgpqE33fm2aDw.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: .config

2020-02-22 Thread Valdis Klētnieks
On Sat, 22 Feb 2020 17:39:03 +, Enzo Desiage said:

> I had a look back at the book and couldn't find the command
> 'make localmodconfig' in "The Linux Kernel  in a Nutshell" book,
> why is that? Was it introduced  in recent years?

Depends what you call "recent". Looks like v2.6.30-ish.

commit 03fa25da8335a942161a8070b3298cfd4edf9b6a
Author: Steven Rostedt 
Date:   Wed Apr 29 22:52:22 2009 -0400

kconfig: make localmodconfig to run streamline_config.pl

Running the streamline_config.pl script manually can still be confusing
for some users. This patch adds the localmodconfig option. This will
automatically run streamline_config.pl on the current .config and
then run "make silentoldconfig" to fix any wholes that might have been
created.

 $ make localmodconfig

This will remove any module configurations in .config that are not needed
to compile the modules that are loaded.

Signed-off-by: Steven Rostedt 



pgpLbXhZipM_n.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


  1   2   3   >