Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd".
On 06/01/2018 09:03 AM, Russell Coker via Selinux wrote: > The command "reboot -nffd" (kernel reboot without flushing kernel buffers or > writing status) when run on a BTRFS system will often result in > /var/log/audit/audit.log being unlabeled. It also results in some > systemd-journald files like > /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/system.journal being > unlabeled but that is rarer. I think that the same problem afflicts both > systemd-journald and auditd but it's a race condition that on my systems > (both production and test) is more likely to affect auditd. > > > > If this issue just affected "reboot -nffd" then a solution might be to just > not run that command. However this affects systems after a power outage. > > > > I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security > update for Debian/Stretch which is the latest supported release of Debian). I > have also reported it in an identical manner with kernel 4.16.0-1-amd64 (the > latest from Debian/Unstable). For testing I reproduced this with a 4G > filesystem in a VM, but in production it has happened on BTRFS RAID-1 arrays, > both SSD and HDD. > > > > #!/bin/bash > set -e > COUNT=$(ps aux|grep [s]bin/auditd|wc -l) > date > if [ "$COUNT" = "1" ]; then > echo "all good" > else > echo "failed" > exit 1 > fi > > Firstly the above is the script /usr/local/sbin/testit, I test for auditd > running because it aborts if the context on it's log file is wrong. > > > > root@stretch:~# ls -liZ /var/log/audit/audit.log > 37952 -rw---. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun > 1 12:23 /var/log/audit/audit.log > > Above is before I do the tests. > > > > while ssh stretch /usr/local/sbin/testit ; do > ssh btrfs-local "reboot -nffd" > /dev/null 2>&1 & > sleep 20 > done > > Above is the shell code I run to do the tests. Note that the VM in question > runs on SSD storage which is why it can consistently boot in less than 20 > seconds. > > > > Fri 1 Jun 12:26:13 UTC 2018 > all good > Fri 1 Jun 12:26:33 UTC 2018 > failed > > Above is the output from the shell code in question. After the first reboot > it fails. The probability of failure on my test system is greater than 50%. > > > > root@stretch:~# ls -liZ /var/log/audit/audit.log > 37952 -rw---. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun 1 > 12:26 /var/log/audit/audit.log > > Now the result. Note that the Inode has not changed. I could understand a > newly created file missing an xattr, but this is an existing file which > shouldn't have had it's xattr changed. But somehow it gets corrupted. > > > > Could this be the fault of SE Linux code? I don't think it's likely but this > is what the BTRFS developers will ask so it's best to discuss this here > before sending it to them. No, that's definitely a filesystem bug. It is the filesystem's responsibility to ensure that new inodes are assigned a security.* xattr in the same transaction as the file creation (ext[234] does this, for example, e.g. via ext4_init_security()), and that they don't lose them. SELinux just provides the xattr suffix ("selinux") and the value/value_len pair. > > > > Does anyone have any ideas of other tests I should run? Anyone want me to try > a different kernel? I can give root on a VM to anyone who wants to poke at > it. Anything else I should add when sending this to the BTRFS developers? ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
Re: [PATCH] staging: lustre: delete the filesystem from the tree.
On Mon, Jun 04, 2018 at 12:09:22AM -0700, Christoph Hellwig wrote: > On Fri, Jun 01, 2018 at 09:08:39PM +0200, Greg Kroah-Hartman wrote: > > Please, compare yourself to orangefs. That is the perfect example of > > how to do everything right. They got their code into staging, cleaned > > it up, talked to us about what was needed to do to get the remaining > > bits in proper shape, they assigned dedicated developers to do that > > work, talked with all of us at different conferences around the world to > > check up and constantly ensure that they were doing the right thing, and > > most importantly, they asked for feedback and acted on it. In the end, > > their codebase is much smaller, works better, is in the "real" part of > > the kernel, and available to every Linux user out there. > > FYI, orangefs never went through the statging tree. Which might be > one reason why it got merged so quickly - allowing rapid iteration > without respect to merged windows, and doing all the trivial cleanups > either before or after (but not at the same time as) the feature > work really does help productivity. Ah, my mistake, for some reason I thought it did, I guess I had offered to take it that way if the developers wanted it. And yes, doing all of the needed cleanups and other changes outside of the kernel tree should be much much faster, which is why I bet it would only take 6 months max to get lustre merged "properly" if they really wanted to do it, by working out-of-tree. Heck, they already have an out-of-tree repo today, so it's not like removing the in-kernel version is going to change their normal development workflow :( greg k-h ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
Re: [PATCH] staging: lustre: delete the filesystem from the tree.
On Fri, Jun 01, 2018 at 09:08:39PM +0200, Greg Kroah-Hartman wrote: > Please, compare yourself to orangefs. That is the perfect example of > how to do everything right. They got their code into staging, cleaned > it up, talked to us about what was needed to do to get the remaining > bits in proper shape, they assigned dedicated developers to do that > work, talked with all of us at different conferences around the world to > check up and constantly ensure that they were doing the right thing, and > most importantly, they asked for feedback and acted on it. In the end, > their codebase is much smaller, works better, is in the "real" part of > the kernel, and available to every Linux user out there. FYI, orangefs never went through the statging tree. Which might be one reason why it got merged so quickly - allowing rapid iteration without respect to merged windows, and doing all the trivial cleanups either before or after (but not at the same time as) the feature work really does help productivity. ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.
On Jun 3, 2018, at 9:59 PM, Alexey Lyashkov wrote: > >> On Sun, Jun 03 2018, Dilger, Andreas wrote: >> >>> LNet is originally based on a high-performance networking stack called >>> Portals (v3, http://www.cs.sandia.gov/Portals/), with additions for LNet >>> routing to allow cross-network bridging. >>> >>> A critical part of LNet is that it is for RDMA and not packet-based >>> messages. Everything in Lustre is structured around RDMA. Of course, >>> RDMA is not possible with TCP > > To be clear. Soft IB (aka Soft RoCE) driver is part of OFED stack from 4.8 > (or 4.9). So RDMA API now is possible with TCP networks. Well, strictly speaking RoCE still isn't possible with TCP networks. RoCE v1 is an Ethernet layer protocol (not IP based), while RoCE v2 is UDP/IP based. Cheers, Andreas signature.asc Description: Message signed with OpenPGP ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.
> 4 июня 2018 г., в 6:54, NeilBrown написал(а): > > On Sun, Jun 03 2018, Dilger, Andreas wrote: > >> On Jun 1, 2018, at 17:19, NeilBrown wrote: >>> >>> On Fri, Jun 01 2018, Doug Oucharek wrote: >>> Would it makes sense to land LNet and LNDs on their own first? Get the networking house in order first before layering on the file system? >>> >>> I'd like to turn that question on it's head: >>> Do we need LNet and LNDs? What value do they provide? >>> (this is a genuine question, not being sarcastic). >>> >>> It is a while since I tried to understand LNet, and then it was a >>> fairly superficial look, but I think it is an abstraction layer >>> that provides packet-based send/receive with some numa-awareness >>> and routing functionality. It sits over sockets (TCP) and IB and >>> provides a uniform interface. >> >> LNet is originally based on a high-performance networking stack called >> Portals (v3, http://www.cs.sandia.gov/Portals/), with additions for LNet >> routing to allow cross-network bridging. >> >> A critical part of LNet is that it is for RDMA and not packet-based >> messages. Everything in Lustre is structured around RDMA. Of course, >> RDMA is not possible with TCP To be clear. Soft IB (aka Soft RoCE) driver is part of OFED stack from 4.8(or 4.9). So RDMA API now is possible with TCP networks. Alex ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.
On Sun, Jun 03 2018, Dilger, Andreas wrote: > On Jun 1, 2018, at 17:19, NeilBrown wrote: >> >> On Fri, Jun 01 2018, Doug Oucharek wrote: >> >>> Would it makes sense to land LNet and LNDs on their own first? Get >>> the networking house in order first before layering on the file >>> system? >> >> I'd like to turn that question on it's head: >> Do we need LNet and LNDs? What value do they provide? >> (this is a genuine question, not being sarcastic). >> >> It is a while since I tried to understand LNet, and then it was a >> fairly superficial look, but I think it is an abstraction layer >> that provides packet-based send/receive with some numa-awareness >> and routing functionality. It sits over sockets (TCP) and IB and >> provides a uniform interface. > > LNet is originally based on a high-performance networking stack called > Portals (v3, http://www.cs.sandia.gov/Portals/), with additions for LNet > routing to allow cross-network bridging. > > A critical part of LNet is that it is for RDMA and not packet-based > messages. Everything in Lustre is structured around RDMA. Of course, > RDMA is not possible with TCP so it just does send/receive under the > covers, though it can do zero copy data sends (and at one time zero-copy > receives, but those changes were rejected by the kernel maintainers). > It definitely does RDMA with IB, RoCE, OPA in the kernel, and other RDMA > network types not in the kernel (e.g. Cray Gemini/Aries, Atos/Bull BXI, > and previously older network types no longer supported). Thanks! That will probably help me understand it more easily next time I dive in. > > Even with TCP it has some improvements for performance, such as using > separate sockets for send and receive of large messages, as well as > a socket for small messages that has Nagle disabled so that it does > not delay those packets for aggregation. That sounds like something that could benefit NFS... pNFS already partially does this by virtue of the fact that data often goes to a different server than control, so a different socket is needed. I wonder if it could benefit from more explicit separate of message sizes. Thanks a lot for this background info! NeilBrown > > In addition to the RDMA support, there is also multi-rail support in > the out-of-tree version that we haven't been allowed to land, which > can aggregate network bandwidth. While there exists channel bonding > for TCP connections, that does not exist for IB or other RDMA networks. > >> That is almost a description of the xprt layer in sunrpc. sunrpc >> doesn't have routing, but it does have some numa awareness (for the >> server side at least) and it definitely provides packet-based >> send/receive over various transports - tcp, udp, local (unix domain), >> and IB. >> So: can we use sunrpc/xprt in place of LNet? > > No, that would totally kill the performance of Lustre. > >> How much would we need to enhance sunrpc/xprt for this to work? What >> hooks would be needed to implement the routing as a separate layer. >> >> If LNet is, in some way, much better than sunrpc, then can we share that >> superior functionality with our NFS friends by adding it to sunrpc? > > There was some discussion at NetApp about adding a Lustre/LNet transport > for pNFS, but I don't think it ever got beyond the proposal stage: > > https://tools.ietf.org/html/draft-faibish-nfsv4-pnfs-lustre-layout-07 > >> Maybe the answer to this is "no", but I think LNet would be hard to sell >> without a clear statement of why that was the answer. > > There are other users outside of the kernel tree that use LNet in addition > to just Lustre. The Cray "DVS" I/O forwarding service[*] uses LNet, and > another experimental filesystem named Zest[+] also used LNet. > > [*] https://www.alcf.anl.gov/files/Sugiyama-Wallace-Thursday16B-slides.pdf > [+] https://www.psc.edu/images/zest/zest-sc07-paper.pdf > >> One reason that I would like to see lustre stay in drivers/staging (so I >> do not support Greg's patch) is that this sort of transition of Lustre >> to using an improved sunrpc/xprt would be much easier if both were in >> the same tree. Certainly it would be easier for a larger community to >> be participating in the work. > > I don't think the proposal to encapsulate all of the Lustre protocol into > pNFS made a lot of sense, since this would have only really been available > on Linux, at which point it would be better to use the native Lustre client > rather than funnel everything through pNFS. > > However, _just_ using the LNet transport for (p)NFS might make sense. LNet > is largely independent from Lustre (it used to be a separate source tree) > and is very efficient over the network. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Intel Corporation signature.asc Description: PGP signature ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get
Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.
On Jun 1, 2018, at 17:19, NeilBrown wrote: > > On Fri, Jun 01 2018, Doug Oucharek wrote: > >> Would it makes sense to land LNet and LNDs on their own first? Get >> the networking house in order first before layering on the file >> system? > > I'd like to turn that question on it's head: > Do we need LNet and LNDs? What value do they provide? > (this is a genuine question, not being sarcastic). > > It is a while since I tried to understand LNet, and then it was a > fairly superficial look, but I think it is an abstraction layer > that provides packet-based send/receive with some numa-awareness > and routing functionality. It sits over sockets (TCP) and IB and > provides a uniform interface. LNet is originally based on a high-performance networking stack called Portals (v3, http://www.cs.sandia.gov/Portals/), with additions for LNet routing to allow cross-network bridging. A critical part of LNet is that it is for RDMA and not packet-based messages. Everything in Lustre is structured around RDMA. Of course, RDMA is not possible with TCP so it just does send/receive under the covers, though it can do zero copy data sends (and at one time zero-copy receives, but those changes were rejected by the kernel maintainers). It definitely does RDMA with IB, RoCE, OPA in the kernel, and other RDMA network types not in the kernel (e.g. Cray Gemini/Aries, Atos/Bull BXI, and previously older network types no longer supported). Even with TCP it has some improvements for performance, such as using separate sockets for send and receive of large messages, as well as a socket for small messages that has Nagle disabled so that it does not delay those packets for aggregation. In addition to the RDMA support, there is also multi-rail support in the out-of-tree version that we haven't been allowed to land, which can aggregate network bandwidth. While there exists channel bonding for TCP connections, that does not exist for IB or other RDMA networks. > That is almost a description of the xprt layer in sunrpc. sunrpc > doesn't have routing, but it does have some numa awareness (for the > server side at least) and it definitely provides packet-based > send/receive over various transports - tcp, udp, local (unix domain), > and IB. > So: can we use sunrpc/xprt in place of LNet? No, that would totally kill the performance of Lustre. > How much would we need to enhance sunrpc/xprt for this to work? What > hooks would be needed to implement the routing as a separate layer. > > If LNet is, in some way, much better than sunrpc, then can we share that > superior functionality with our NFS friends by adding it to sunrpc? There was some discussion at NetApp about adding a Lustre/LNet transport for pNFS, but I don't think it ever got beyond the proposal stage: https://tools.ietf.org/html/draft-faibish-nfsv4-pnfs-lustre-layout-07 > Maybe the answer to this is "no", but I think LNet would be hard to sell > without a clear statement of why that was the answer. There are other users outside of the kernel tree that use LNet in addition to just Lustre. The Cray "DVS" I/O forwarding service[*] uses LNet, and another experimental filesystem named Zest[+] also used LNet. [*] https://www.alcf.anl.gov/files/Sugiyama-Wallace-Thursday16B-slides.pdf [+] https://www.psc.edu/images/zest/zest-sc07-paper.pdf > One reason that I would like to see lustre stay in drivers/staging (so I > do not support Greg's patch) is that this sort of transition of Lustre > to using an improved sunrpc/xprt would be much easier if both were in > the same tree. Certainly it would be easier for a larger community to > be participating in the work. I don't think the proposal to encapsulate all of the Lustre protocol into pNFS made a lot of sense, since this would have only really been available on Linux, at which point it would be better to use the native Lustre client rather than funnel everything through pNFS. However, _just_ using the LNet transport for (p)NFS might make sense. LNet is largely independent from Lustre (it used to be a separate source tree) and is very efficient over the network. Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Intel Corporation ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.
On Fri, Jun 01 2018, Greg Kroah-Hartman wrote: > > So, let's just delete the whole mess. Now the lustre developers can go > off and work in their out-of-tree codebase and not have to worry about > providing valid changelog entries and breaking their patches up into > logical pieces. I find it incredible that anyone would think that not having to "worry about providing valid changelogs" and not "breaking their patches up into logic pieces" could ever be seen as a good idea. I hope that if lustre development is excluded from mainline for a time, that we can still maintain the practices that demonstrably work so well. For the record: I'm not in favor of ejecting this code from mainline. I think that the long term result may be that it never comes back, and will like at least delay the process. But you must do what you think is best. Thanks, NeilBrown signature.asc Description: PGP signature ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.
Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.
On Fri, Jun 01 2018, Doug Oucharek wrote: > Would it makes sense to land LNet and LNDs on their own first? Get > the networking house in order first before layering on the file > system? I'd like to turn that question on it's head: Do we need LNet and LNDs? What value do they provide? (this is a genuine question, not being sarcastic). It is a while since I tried to understand LNet, and then it was a fairly superficial look, but I think it is an abstraction layer that provides packet-based send/receive with some numa-awareness and routing functionality. It sits over sockets (TCP) and IB and provides a uniform interface. That is almost a description of the xprt layer in sunrpc. sunrpc doesn't have routing, but it does have some numa awareness (for the server side at least) and it definitely provides packet-based send/receive over various transports - tcp, udp, local (unix domain), and IB. So: can we use sunrpc/xprt in place of LNet? How much would we need to enhance sunrpc/xprt for this to work? What hooks would be needed to implement the routing as a separate layer. If LNet is, in some way, much better than sunrpc, then can we share that superior functionality with our NFS friends by adding it to sunrpc? Maybe the answer to this is "no", but I think LNet would be hard to sell without a clear statement of why that was the answer. One reason that I would like to see lustre stay in drivers/staging (so I do not support Greg's patch) is that this sort of transition of Lustre to using an improved sunrpc/xprt would be much easier if both were in the same tree. Certainly it would be easier for a larger community to be participating in the work. Thanks, NeilBrown signature.asc Description: PGP signature ___ Selinux mailing list Selinux@tycho.nsa.gov To unsubscribe, send email to selinux-le...@tycho.nsa.gov. To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.