Re: BTRFS losing SE Linux labels on power failure or "reboot -nffd".

2018-06-04 Thread Stephen Smalley
On 06/01/2018 09:03 AM, Russell Coker via Selinux wrote:
> The command "reboot -nffd" (kernel reboot without flushing kernel buffers or 
> writing status) when run on a BTRFS system will often result in 
> /var/log/audit/audit.log being unlabeled. It also results in some 
> systemd-journald files like 
> /var/log/journal/c195779d29154ed8bcb4e8444c4a1728/system.journal being 
> unlabeled but that is rarer. I think that the same problem afflicts both 
> systemd-journald and auditd but it's a race condition that on my systems 
> (both production and test) is more likely to affect auditd.
> 
>  
> 
> If this issue just affected "reboot -nffd" then a solution might be to just 
> not run that command. However this affects systems after a power outage.
> 
>  
> 
> I have reproduced this bug with kernel 4.9.0-6-amd64 (the latest security 
> update for Debian/Stretch which is the latest supported release of Debian). I 
> have also reported it in an identical manner with kernel 4.16.0-1-amd64 (the 
> latest from Debian/Unstable). For testing I reproduced this with a 4G 
> filesystem in a VM, but in production it has happened on BTRFS RAID-1 arrays, 
> both SSD and HDD.
> 
>  
> 
> #!/bin/bash
> set -e
> COUNT=$(ps aux|grep [s]bin/auditd|wc -l)
> date
> if [ "$COUNT" = "1" ]; then
>  echo "all good"
> else
>  echo "failed"
>  exit 1
> fi
> 
> Firstly the above is the script /usr/local/sbin/testit, I test for auditd 
> running because it aborts if the context on it's log file is wrong.
> 
>  
> 
> root@stretch:~# ls -liZ /var/log/audit/audit.log
> 37952 -rw---. 1 root root system_u:object_r:auditd_log_t:s0 4385230 Jun  
> 1 12:23 /var/log/audit/audit.log
> 
> Above is before I do the tests.
> 
>  
> 
> while ssh stretch /usr/local/sbin/testit ; do
>  ssh btrfs-local "reboot -nffd" > /dev/null 2>&1 &
>  sleep 20
> done
> 
> Above is the shell code I run to do the tests. Note that the VM in question 
> runs on SSD storage which is why it can consistently boot in less than 20 
> seconds.
> 
>  
> 
> Fri  1 Jun 12:26:13 UTC 2018
> all good
> Fri  1 Jun 12:26:33 UTC 2018
> failed
> 
> Above is the output from the shell code in question. After the first reboot 
> it fails. The probability of failure on my test system is greater than 50%.
> 
>  
> 
> root@stretch:~# ls -liZ /var/log/audit/audit.log  
> 37952 -rw---. 1 root root system_u:object_r:unlabeled_t:s0 4396803 Jun  1 
> 12:26 /var/log/audit/audit.log
> 
> Now the result. Note that the Inode has not changed. I could understand a 
> newly created file missing an xattr, but this is an existing file which 
> shouldn't have had it's xattr changed. But somehow it gets corrupted.
> 
>  
> 
> Could this be the fault of SE Linux code? I don't think it's likely but this 
> is what the BTRFS developers will ask so it's best to discuss this here 
> before sending it to them.

No, that's definitely a filesystem bug.  It is the filesystem's responsibility 
to ensure that new inodes are assigned a security.* xattr in the same 
transaction as the file creation (ext[234] does this, for example, e.g. via 
ext4_init_security()), and that they don't lose them.  SELinux just provides 
the xattr suffix ("selinux") and the value/value_len pair.

> 
>  
> 
> Does anyone have any ideas of other tests I should run? Anyone want me to try 
> a different kernel? I can give root on a VM to anyone who wants to poke at 
> it. Anything else I should add when sending this to the BTRFS developers?

___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.

Re: [PATCH] staging: lustre: delete the filesystem from the tree.

2018-06-04 Thread Greg Kroah-Hartman
On Mon, Jun 04, 2018 at 12:09:22AM -0700, Christoph Hellwig wrote:
> On Fri, Jun 01, 2018 at 09:08:39PM +0200, Greg Kroah-Hartman wrote:
> > Please, compare yourself to orangefs.  That is the perfect example of
> > how to do everything right.  They got their code into staging, cleaned
> > it up, talked to us about what was needed to do to get the remaining
> > bits in proper shape, they assigned dedicated developers to do that
> > work, talked with all of us at different conferences around the world to
> > check up and constantly ensure that they were doing the right thing, and
> > most importantly, they asked for feedback and acted on it.  In the end,
> > their codebase is much smaller, works better, is in the "real" part of
> > the kernel, and available to every Linux user out there.
> 
> FYI, orangefs never went through the statging tree.  Which might be
> one reason why it got merged so quickly - allowing rapid iteration
> without respect to merged windows, and doing all the trivial cleanups
> either before or after (but not at the same time as) the feature
> work really does help productivity.

Ah, my mistake, for some reason I thought it did, I guess I had offered
to take it that way if the developers wanted it.

And yes, doing all of the needed cleanups and other changes outside of
the kernel tree should be much much faster, which is why I bet it would
only take 6 months max to get lustre merged "properly" if they really
wanted to do it, by working out-of-tree.

Heck, they already have an out-of-tree repo today, so it's not like
removing the in-kernel version is going to change their normal
development workflow :(

greg k-h

___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [PATCH] staging: lustre: delete the filesystem from the tree.

2018-06-04 Thread Christoph Hellwig
On Fri, Jun 01, 2018 at 09:08:39PM +0200, Greg Kroah-Hartman wrote:
> Please, compare yourself to orangefs.  That is the perfect example of
> how to do everything right.  They got their code into staging, cleaned
> it up, talked to us about what was needed to do to get the remaining
> bits in proper shape, they assigned dedicated developers to do that
> work, talked with all of us at different conferences around the world to
> check up and constantly ensure that they were doing the right thing, and
> most importantly, they asked for feedback and acted on it.  In the end,
> their codebase is much smaller, works better, is in the "real" part of
> the kernel, and available to every Linux user out there.

FYI, orangefs never went through the statging tree.  Which might be
one reason why it got merged so quickly - allowing rapid iteration
without respect to merged windows, and doing all the trivial cleanups
either before or after (but not at the same time as) the feature
work really does help productivity.

___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.

2018-06-04 Thread Andreas Dilger
On Jun 3, 2018, at 9:59 PM, Alexey Lyashkov  wrote:
> 
>> On Sun, Jun 03 2018, Dilger, Andreas wrote:
>> 
>>> LNet is originally based on a high-performance networking stack called
>>> Portals (v3, http://www.cs.sandia.gov/Portals/), with additions for LNet
>>> routing to allow cross-network bridging.
>>> 
>>> A critical part of LNet is that it is for RDMA and not packet-based
>>> messages.  Everything in Lustre is structured around RDMA.  Of course,
>>> RDMA is not possible with TCP
> 
> To be clear. Soft IB (aka Soft RoCE) driver is part of OFED stack from 4.8
> (or 4.9).  So RDMA API now is possible with TCP networks.

Well, strictly speaking RoCE still isn't possible with TCP networks.  RoCE v1
is an Ethernet layer protocol (not IP based), while RoCE v2 is UDP/IP based.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.

Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.

2018-06-04 Thread Alexey Lyashkov

> 4 июня 2018 г., в 6:54, NeilBrown  написал(а):
> 
> On Sun, Jun 03 2018, Dilger, Andreas wrote:
> 
>> On Jun 1, 2018, at 17:19, NeilBrown  wrote:
>>> 
>>> On Fri, Jun 01 2018, Doug Oucharek wrote:
>>> 
 Would it makes sense to land LNet and LNDs on their own first?  Get
 the networking house in order first before layering on the file
 system?
>>> 
>>> I'd like to turn that question on it's head:
>>> Do we need LNet and LNDs?  What value do they provide?
>>> (this is a genuine question, not being sarcastic).
>>> 
>>> It is a while since I tried to understand LNet, and then it was a
>>> fairly superficial look, but I think it is an abstraction layer
>>> that provides packet-based send/receive with some numa-awareness
>>> and routing functionality.  It sits over sockets (TCP) and IB and
>>> provides a uniform interface.
>> 
>> LNet is originally based on a high-performance networking stack called
>> Portals (v3, http://www.cs.sandia.gov/Portals/), with additions for LNet
>> routing to allow cross-network bridging.
>> 
>> A critical part of LNet is that it is for RDMA and not packet-based
>> messages.  Everything in Lustre is structured around RDMA.  Of course,
>> RDMA is not possible with TCP

To be clear. Soft IB (aka Soft RoCE) driver is part of OFED stack from 4.8(or 
4.9).
So RDMA API now is possible with TCP networks.


Alex

___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.

Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.

2018-06-04 Thread NeilBrown
On Sun, Jun 03 2018, Dilger, Andreas wrote:

> On Jun 1, 2018, at 17:19, NeilBrown  wrote:
>> 
>> On Fri, Jun 01 2018, Doug Oucharek wrote:
>> 
>>> Would it makes sense to land LNet and LNDs on their own first?  Get
>>> the networking house in order first before layering on the file
>>> system?
>> 
>> I'd like to turn that question on it's head:
>>  Do we need LNet and LNDs?  What value do they provide?
>> (this is a genuine question, not being sarcastic).
>> 
>> It is a while since I tried to understand LNet, and then it was a
>> fairly superficial look, but I think it is an abstraction layer
>> that provides packet-based send/receive with some numa-awareness
>> and routing functionality.  It sits over sockets (TCP) and IB and
>> provides a uniform interface.
>
> LNet is originally based on a high-performance networking stack called
> Portals (v3, http://www.cs.sandia.gov/Portals/), with additions for LNet
> routing to allow cross-network bridging.
>
> A critical part of LNet is that it is for RDMA and not packet-based
> messages.  Everything in Lustre is structured around RDMA.  Of course,
> RDMA is not possible with TCP so it just does send/receive under the
> covers, though it can do zero copy data sends (and at one time zero-copy
> receives, but those changes were rejected by the kernel maintainers).
> It definitely does RDMA with IB, RoCE, OPA in the kernel, and other RDMA
> network types not in the kernel (e.g. Cray Gemini/Aries, Atos/Bull BXI,
> and previously older network types no longer supported).

Thanks!  That will probably help me understand it more easily next time
I dive in.

>
> Even with TCP it has some improvements for performance, such as using
> separate sockets for send and receive of large messages, as well as
> a socket for small messages that has Nagle disabled so that it does
> not delay those packets for aggregation.

That sounds like something that could benefit NFS...
pNFS already partially does this by virtue of the fact that data often
goes to a different server than control, so a different socket is
needed.  I wonder if it could benefit from more explicit separate of
message sizes.


Thanks a lot for this background info!
NeilBrown

>
> In addition to the RDMA support, there is also multi-rail support in
> the out-of-tree version that we haven't been allowed to land, which
> can aggregate network bandwidth.  While there exists channel bonding
> for TCP connections, that does not exist for IB or other RDMA networks.
>
>> That is almost a description of the xprt layer in sunrpc.  sunrpc
>> doesn't have routing, but it does have some numa awareness (for the
>> server side at least) and it definitely provides packet-based
>> send/receive over various transports - tcp, udp, local (unix domain),
>> and IB.
>> So: can we use sunrpc/xprt in place of LNet?
>
> No, that would totally kill the performance of Lustre.
>
>> How much would we need to enhance sunrpc/xprt for this to work?  What
>> hooks would be needed to implement the routing as a separate layer.
>> 
>> If LNet is, in some way, much better than sunrpc, then can we share that
>> superior functionality with our NFS friends by adding it to sunrpc?
>
> There was some discussion at NetApp about adding a Lustre/LNet transport
> for pNFS, but I don't think it ever got beyond the proposal stage:
>
> https://tools.ietf.org/html/draft-faibish-nfsv4-pnfs-lustre-layout-07
>
>> Maybe the answer to this is "no", but I think LNet would be hard to sell
>> without a clear statement of why that was the answer.
>
> There are other users outside of the kernel tree that use LNet in addition
> to just Lustre.  The Cray "DVS" I/O forwarding service[*] uses LNet, and
> another experimental filesystem named Zest[+] also used LNet.
>
> [*] https://www.alcf.anl.gov/files/Sugiyama-Wallace-Thursday16B-slides.pdf
> [+] https://www.psc.edu/images/zest/zest-sc07-paper.pdf
>
>> One reason that I would like to see lustre stay in drivers/staging (so I
>> do not support Greg's patch) is that this sort of transition of Lustre
>> to using an improved sunrpc/xprt would be much easier if both were in
>> the same tree.  Certainly it would be easier for a larger community to
>> be participating in the work.
>
> I don't think the proposal to encapsulate all of the Lustre protocol into
> pNFS made a lot of sense, since this would have only really been available
> on Linux, at which point it would be better to use the native Lustre client
> rather than funnel everything through pNFS.
>
> However, _just_ using the LNet transport for (p)NFS might make sense.  LNet
> is largely independent from Lustre (it used to be a separate source tree)
> and is very efficient over the network.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation


signature.asc
Description: PGP signature
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get 

Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.

2018-06-04 Thread Dilger, Andreas
On Jun 1, 2018, at 17:19, NeilBrown  wrote:
> 
> On Fri, Jun 01 2018, Doug Oucharek wrote:
> 
>> Would it makes sense to land LNet and LNDs on their own first?  Get
>> the networking house in order first before layering on the file
>> system?
> 
> I'd like to turn that question on it's head:
>  Do we need LNet and LNDs?  What value do they provide?
> (this is a genuine question, not being sarcastic).
> 
> It is a while since I tried to understand LNet, and then it was a
> fairly superficial look, but I think it is an abstraction layer
> that provides packet-based send/receive with some numa-awareness
> and routing functionality.  It sits over sockets (TCP) and IB and
> provides a uniform interface.

LNet is originally based on a high-performance networking stack called
Portals (v3, http://www.cs.sandia.gov/Portals/), with additions for LNet
routing to allow cross-network bridging.

A critical part of LNet is that it is for RDMA and not packet-based
messages.  Everything in Lustre is structured around RDMA.  Of course,
RDMA is not possible with TCP so it just does send/receive under the
covers, though it can do zero copy data sends (and at one time zero-copy
receives, but those changes were rejected by the kernel maintainers).
It definitely does RDMA with IB, RoCE, OPA in the kernel, and other RDMA
network types not in the kernel (e.g. Cray Gemini/Aries, Atos/Bull BXI,
and previously older network types no longer supported).

Even with TCP it has some improvements for performance, such as using
separate sockets for send and receive of large messages, as well as
a socket for small messages that has Nagle disabled so that it does
not delay those packets for aggregation.

In addition to the RDMA support, there is also multi-rail support in
the out-of-tree version that we haven't been allowed to land, which
can aggregate network bandwidth.  While there exists channel bonding
for TCP connections, that does not exist for IB or other RDMA networks.

> That is almost a description of the xprt layer in sunrpc.  sunrpc
> doesn't have routing, but it does have some numa awareness (for the
> server side at least) and it definitely provides packet-based
> send/receive over various transports - tcp, udp, local (unix domain),
> and IB.
> So: can we use sunrpc/xprt in place of LNet?

No, that would totally kill the performance of Lustre.

> How much would we need to enhance sunrpc/xprt for this to work?  What
> hooks would be needed to implement the routing as a separate layer.
> 
> If LNet is, in some way, much better than sunrpc, then can we share that
> superior functionality with our NFS friends by adding it to sunrpc?

There was some discussion at NetApp about adding a Lustre/LNet transport
for pNFS, but I don't think it ever got beyond the proposal stage:

https://tools.ietf.org/html/draft-faibish-nfsv4-pnfs-lustre-layout-07

> Maybe the answer to this is "no", but I think LNet would be hard to sell
> without a clear statement of why that was the answer.

There are other users outside of the kernel tree that use LNet in addition
to just Lustre.  The Cray "DVS" I/O forwarding service[*] uses LNet, and
another experimental filesystem named Zest[+] also used LNet.

[*] https://www.alcf.anl.gov/files/Sugiyama-Wallace-Thursday16B-slides.pdf
[+] https://www.psc.edu/images/zest/zest-sc07-paper.pdf

> One reason that I would like to see lustre stay in drivers/staging (so I
> do not support Greg's patch) is that this sort of transition of Lustre
> to using an improved sunrpc/xprt would be much easier if both were in
> the same tree.  Certainly it would be easier for a larger community to
> be participating in the work.

I don't think the proposal to encapsulate all of the Lustre protocol into
pNFS made a lot of sense, since this would have only really been available
on Linux, at which point it would be better to use the native Lustre client
rather than funnel everything through pNFS.

However, _just_ using the LNet transport for (p)NFS might make sense.  LNet
is largely independent from Lustre (it used to be a separate source tree)
and is very efficient over the network.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation








___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.


Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.

2018-06-04 Thread NeilBrown
On Fri, Jun 01 2018, Greg Kroah-Hartman wrote:

>
> So, let's just delete the whole mess.  Now the lustre developers can go
> off and work in their out-of-tree codebase and not have to worry about
> providing valid changelog entries and breaking their patches up into
> logical pieces.

I find it incredible that anyone would think that not having to "worry
about providing valid changelogs" and not "breaking their patches up
into logic pieces" could ever be seen as a good idea.  I hope that if
lustre development is excluded from mainline for a time, that we can
still maintain the practices that demonstrably work so well.

For the record: I'm not in favor of ejecting this code from mainline.  I
think that the long term result may be that it never comes back, and
will like at least delay the process.
But you must do what you think is best.

Thanks,
NeilBrown


signature.asc
Description: PGP signature
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.

Re: [lustre-devel] [PATCH] staging: lustre: delete the filesystem from the tree.

2018-06-04 Thread NeilBrown
On Fri, Jun 01 2018, Doug Oucharek wrote:

> Would it makes sense to land LNet and LNDs on their own first?  Get
> the networking house in order first before layering on the file
> system?

I'd like to turn that question on it's head:
  Do we need LNet and LNDs?  What value do they provide?
(this is a genuine question, not being sarcastic).

It is a while since I tried to understand LNet, and then it was a
fairly superficial look, but I think it is an abstraction layer
that provides packet-based send/receive with some numa-awareness
and routing functionality.  It sits over sockets (TCP) and IB and
provides a uniform interface.

That is almost a description of the xprt layer in sunrpc.  sunrpc
doesn't have routing, but it does have some numa awareness (for the
server side at least) and it definitely provides packet-based
send/receive over various transports - tcp, udp, local (unix domain),
and IB.
So: can we use sunrpc/xprt in place of LNet?  How much would we need to
enhance sunrpc/xprt for this to work?  What hooks would be needed to
implement the routing as a separate layer.

If LNet is, in some way, much better than sunrpc, then can we share that
superior functionality with our NFS friends by adding it to sunrpc?

Maybe the answer to this is "no", but I think LNet would be hard to sell
without a clear statement of why that was the answer.

One reason that I would like to see lustre stay in drivers/staging (so I
do not support Greg's patch) is that this sort of transition of Lustre
to using an improved sunrpc/xprt would be much easier if both were in
the same tree.  Certainly it would be easier for a larger community to
be participating in the work.

Thanks,
NeilBrown


signature.asc
Description: PGP signature
___
Selinux mailing list
Selinux@tycho.nsa.gov
To unsubscribe, send email to selinux-le...@tycho.nsa.gov.
To get help, send an email containing "help" to selinux-requ...@tycho.nsa.gov.