Re: [dm-devel] [PATCH v2 41/42] libmultipath: refuse reloading an existing map with different WWID

2020-08-14 Thread Martin Wilck
On Fri, 2020-08-14 at 09:48 +0200, Martin Wilck wrote:
> Hi Ben,
> 
> On Thu, 2020-08-13 at 17:23 -0500, Benjamin Marzinski wrote:
> > 
> > I'm
> > also open to convinced that it would be o.k. to rename a device
> > that
> > we
> > weren't planing on reloading, because it has a wrong alias, which
> > messes
> > with the device that we were supposed to reload (I assume this
> > makes
> > the
> > code easier). Does that sound like a reasonable solution?
> 
> I sounds ok.

s/I/It/ - sorry.


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2 41/42] libmultipath: refuse reloading an existing map with different WWID

2020-08-14 Thread Martin Wilck
Hi Ben,

On Thu, 2020-08-13 at 17:23 -0500, Benjamin Marzinski wrote:
> On Wed, Aug 12, 2020 at 01:34:05PM +0200, mwi...@suse.com wrote:
> > From: Martin Wilck 
> > 
> > If a map with given name (alias) already exists with a different
> > WWID,
> > reloading it with a new WWID is harmful. The existing paths would
> > first
> > be replaced by others with the new WWID. The WWIDs of the new paths
> > wouldn't
> > match the map WWID; thus the next time the map is disassembled, the
> > (correct)
> > path WWIDs would be overwritten by the different map WWID (with
> > subsequent
> > patches from this series, they'd be orphaned instead, which is
> > better but
> > still not ideal). When the map is reloaded later, paths with
> > diffent WWIDs
> > may be mixed, causing data corruption.
> > 
> > Refuse reloading a map under such circumstances.
> > 
> > Note: this patch doesn't change multipath-tools behavior in the
> > case
> > where valid map aliases are supposed to be swapped (typically if
> > the bindings
> > file differs between initrd and root FS). In this case,
> > select_action()
> > selects ACT_RENAME, which this patch doesn't affect. The user-
> > visible behavior
> > in this case remains as-is: the map renames fail, and the aliases
> > remain
> > unchanged, thus not matching the current bindings file, but art
> > least
> > internally consistent.
> > 
> > To fully fix this use case, coalesce_paths() must cease setting up
> > maps one
> > by one. Instead, we'd need to build up a full set of maps-to-be-
> > set-up,
> > and create them in a single batch, figuring out a "rename strategy"
> > beforehand
> > to avoid clashes between existing and to-be-created maps (for
> > example,
> > a name swap A<->B would need to be carried out as B->tmp, A->B,
> > tmp->A).
> > Implementing that is out of the scope of this patch series.
> > 
> 
> We need to do something about this, and this is an easy stop-gap, so
> I'm
> fine with this patch.  However it might be better check if we can
> fall
> back to using the wwid for the alias (we should always be able to,
> unless people pick really bad aliases in /etc/multipath.conf). This
> will
> allow us to create the device.  It can always get renamed later, and
> all
> things being equal, it seems better to have a device with a wwid
> alias,
> than to have no device at all.

Patch 80 from my series basically does that. There's currently no check
whether some other map mistakenly uses that WWID as alias, though.

> 
> Also, I do wonder if we can't still setup devices one at a time.  As
> long as we agree that it is unsupported to configure multipath so
> that
> two devices have the same alias or to configure a device with an
> alias
> that matches another device's wwid, and that we can't guarantee what
> the
> devices will be named in that case (or even that there names won't
> change when commands are run), it seems doable.

There are potential conflicts in two directions: 1) with the future,
to-be-set-up set of maps, and 2) with the current, existing maps. This
is problematic per se, because it creates more apparent conflicts than
actually exist. The current code looks almost exclusively at conflicts
of type 2), but I think what actually matters more is type 1).

If we didn't set up one by one, we could examine the "future" setup for
possible conflicts, and ignore conflicts with the existing setup
entirely, except for calculating a setup strategy / ordering that would
avoid conflicts "on the way".

When we set up one by one, it gets even worse because 1) is only
partially known while we set up the maps. When we set up a map, we
simply don't know which alias other maps we are yet going to set up
will be configured to have. Both false positives and false negatives
are possible. My attempt to take conflicts of type 1) into account in
the v1 series was incorrect and rightfully rejected by you.

> If we are reloading all the devices, and there is an A<->B alias
> swap,
> when we reload A (assume that's the first one we see, with a wwid of
> WWID1) we notice that there already is an existing device named B
> (with
> a wwid of WWID2). Before continuing with the reload of A, we reanme B
> to
> WWID2 (which should work. If not we refuse the reload of A). When we
> later get to the device now named WWID2, we can rename it to A.

This is one possible approach, yes. Indeed I considered a more radical
idea, renaming *all* maps to either WWID or random names before trying
to set up the final set of maps. That would avoid name clashes "on the
way" by definition, and avoid rename-specific error handling. But it
would issue a lot of unnecessary rename ioctls.

The other option would be bulk renaming with a pre-calculated renaming
strategy. I believe there two basic renaming sequences, one being a
"linear" sequence A->B->C->...->X->, where  is a previously
unused alias, and one A->B->C->...->X->A (cycle; the swap is a special
case of the cycle). The linear case can be resolved easily be renaming

Re: [dm-devel] [RESEND PATCH] nvme: explicitly use normal NVMe error handling when appropriate

2020-08-14 Thread Meneghini, John
Mike, 

I don't see your patch at:

https://patchwork.kernel.org/project/linux-block/list/?submitter=1643

Can you please upload this patch there?

Thanks,

/John

On 8/13/20, 10:48 AM, "Mike Snitzer"  wrote:

Commit 764e9332098c0 ("nvme-multipath: do not reset on unknown
status"), among other things, fixed NVME_SC_CMD_INTERRUPTED error
handling by changing multipathing's nvme_failover_req() to short-circuit
path failover and then fallback to NVMe's normal error handling (which
takes care of NVME_SC_CMD_INTERRUPTED).

This detour through native NVMe multipathing code is unwelcome because
it prevents NVMe core from handling NVME_SC_CMD_INTERRUPTED independent
of any multipathing concerns.

Introduce nvme_status_needs_local_error_handling() to prioritize
non-failover retry, when appropriate, in terms of normal NVMe error
handling.  nvme_status_needs_local_error_handling() will naturely evolve
to include handling of any other errors that normal error handling must
be used for.

nvme_failover_req()'s ability to fallback to normal NVMe error handling
has been preserved because it may be useful for future NVME_SC that
nvme_status_needs_local_error_handling() hasn't been trained for yet.

Signed-off-by: Mike Snitzer 
---
 drivers/nvme/host/core.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 88cff309d8e4..be749b690af7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -252,6 +252,16 @@ static inline bool nvme_req_needs_retry(struct request 
*req)
return true;
 }

+static inline bool nvme_status_needs_local_error_handling(u16 status)
+{
+   switch (status & 0x7ff) {
+   case NVME_SC_CMD_INTERRUPTED:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static void nvme_retry_req(struct request *req)
 {
struct nvme_ns *ns = req->q->queuedata;
@@ -270,7 +280,8 @@ static void nvme_retry_req(struct request *req)

 void nvme_complete_rq(struct request *req)
 {
-   blk_status_t status = nvme_error_status(nvme_req(req)->status);
+   u16 nvme_status = nvme_req(req)->status;
+   blk_status_t status = nvme_error_status(nvme_status);

trace_nvme_complete_rq(req);

@@ -280,7 +291,8 @@ void nvme_complete_rq(struct request *req)
nvme_req(req)->ctrl->comp_seen = true;

if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
-   if ((req->cmd_flags & REQ_NVME_MPATH) && 
nvme_failover_req(req))
+   if (!nvme_status_needs_local_error_handling(nvme_status) &&
+   (req->cmd_flags & REQ_NVME_MPATH) && 
nvme_failover_req(req))
return;

if (!blk_queue_dying(req->q)) {
--
2.18.0



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [RESEND PATCH] nvme: explicitly use normal NVMe error handling when appropriate

2020-08-14 Thread Sagi Grimberg




+   switch (nvme_req_disposition(req)) {
+   case COMPLETE:
+   nvme_complete_req(req);


nvme_complete_rq calling nvme_complete_req... Maybe call it
__nvme_complete_rq instead?


That's what I had first, but it felt so strangely out of place next
to the other nvme_*_req calls..

Maybe nvme_complete_rq needs to be renamed - what about nvme_req_done?


I'd suggest to call nvme_complete_rq nvme_end_rq because it really calls
blk_mq_end_request. That would confuse with nvme_end_request which
calls blk_mq_complete_request... Maybe we need some naming improvements
throughout.


Maybe call nvme_req_disposition again locally here to not carry
the is_ana_status. But not a biggy..


That means it would have to become non-static in scope, limiting
inlining possibilities, etc.


I see.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)

2020-08-14 Thread Chuck Lever



> On Aug 12, 2020, at 11:51 AM, James Bottomley 
>  wrote:
> 
> On Wed, 2020-08-12 at 10:15 -0400, Chuck Lever wrote:
>>> On Aug 11, 2020, at 11:53 AM, James Bottomley
>>>  wrote:
>>> 
>>> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
> [...]
> 
> and what is nice to have to speed up the verification
> process.  The choice for the latter is cache or reconstruct
> depending on the resources available.  If the tree gets cached
> on the server, that would be a server implementation detail
> invisible to the client.
 
 We assume that storage targets (for block or file) are not
 trusted. Therefore storage clients cannot rely on intermediate
 results (eg, middle nodes in a Merkle tree) unless those results
 are generated within the client's trust envelope.
>>> 
>>> Yes, they can ... because supplied nodes can be verified.  That's
>>> the whole point of a merkle tree.  As long as I'm sure of the root
>>> hash I can verify all the rest even if supplied by an untrusted
>>> source.  If you consider a simple merkle tree covering 4 blocks:
>>> 
>>>  R
>>>/   \
>>> H11 H12
>>> / \ / \
>>> H21 H22 H23 H24
   |   |   |
>>> 
>>> B1   B2  B3  B4
>>> 
>>> Assume I have the verified root hash R.  If you supply B3 you also
>>> supply H24 and H11 as proof.  I verify by hashing B3 to produce H23
>>> then hash H23 and H24 to produce H12 and if H12 and your supplied
>>> H11 hash to R the tree is correct and the B3 you supplied must
>>> likewise be correct.
>> 
>> I'm not sure what you are proving here. Obviously this has to work
>> in order for a client to reconstruct the file's Merkle tree given
>> only R and the file content.
> 
> You implied the server can't be trusted to generate the merkel tree. 
> I'm showing above it can because of the tree path based verification.

What I was implying is that clients can't trust intermediate Merkle
tree content that is not also signed. So far we are talking about
signing only the tree root.

The storage server can store the tree durably, but if the intermediate
parts of the tree are not signed, the client has to verify them anyway,
and that reduces the value of storing potentially large data structures.


>> It's the construction of the tree and verification of the hashes that
>> are potentially expensive. The point of caching intermediate hashes
>> is so that the client verifies them as few times as possible.  I
>> don't see value in caching those hashes on an untrusted server --
>> the client will have to reverify them anyway, and there will be no
>> savings.
> 
> I'm not making any claim about server caching, I'm just saying the
> client can request pieces of the tree from the server without having to
> reconstruct the whole thing itself because it can verify their
> correctness.

To be clear, my concern is about how much of the tree might be stored
in a Merkle-based metadata format. I just don't see that it has much
value to store more than the signed tree root, because the client will
have to reconstitute or verify some tree contents on most every read.

For sufficiently large files, the tree itself can be larger than what
can be stored in an xattr. This is the same problem that fs-verity
faces. And, as I stated earlier, xattr objects are read in their
entirety, they can't be seeked into or read piecemeal.

What it seemed to me that you were suggesting was an offloaded cache
of the Merkle tree. Either the whole tree is stored on the storage
server, or the storage server provides a service that reconstitutes
that tree on behalf of clients. (Please correct me if I misunderstood).
I just don't think that will be practicable or provide the kind of
benefit you might want.


>> Cache once, as close as you can to where the data will be used.
>> 
>> 
 So: if the storage target is considered inside the client's trust
 envelope, it can cache or store durably any intermediate parts of
 the verification process. If not, the network and file storage is
 considered untrusted, and the client has to rely on nothing but
 the signed digest of the tree root.
 
 We could build a scheme around, say, fscache, that might save the
 intermediate results durably and locally.
>>> 
>>> I agree we want caching on the client, but we can always page in
>>> from the remote as long as we page enough to verify up to R, so
>>> we're always sure the remote supplied genuine information.
>> 
>> Agreed.
>> 
>> 
>> For this reason, the idea was to save only the signature of
>> the tree's root on durable storage. The client would retrieve
>> that signature possibly at open time, and reconstruct the
>> tree at that time.
> 
> Right that's the integrity data you must have.
> 
>> Or the tree could be partially constructed on-demand at the
>> time each unit is to be checked (say, as part of 2. above).
> 
> Whether it's reconstructed or cached can be an implementation
> 

Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)

2020-08-14 Thread Chuck Lever



> On Aug 13, 2020, at 10:42 AM, James Bottomley 
>  wrote:
> 
> On Thu, 2020-08-13 at 10:21 -0400, Chuck Lever wrote:
>>> On Aug 12, 2020, at 11:42 AM, James Bottomley >> enPartnership.com> wrote:
> [...]
>>> For most people the security mechanism of local xattrs is
>>> sufficient.  If you're paranoid, you don't believe it is and you
>>> use EVM.
>> 
>> When IMA metadata happens to be stored in local filesystems in
>> a trusted xattr, it's going to enjoy the protection you describe
>> without needing the addition of a cryptographic signature.
>> 
>> However, that metadata doesn't live its whole life there. It
>> can reside in a tar file, it can cross a network, it can live
>> on a back-up tape. I think we agree that any time that metadata
>> is in transit or at rest outside of a Linux local filesystem, it
>> is exposed.
>> 
>> Thus I'm interested in a metadata protection mechanism that does
>> not rely on the security characteristics of a particular storage
>> container. For me, a cryptographic signature fits that bill
>> nicely.
> 
> Sure, but one of the points about IMA is a separation of mechanism from
> policy.  Signed hashes (called appraisal in IMA terms) is just one
> policy you can decide to require or not or even make it conditional on
> other things.

AFAICT, the current EVM_IMA_DIGSIG and EVM_PORTABLE_DIGSIG formats are
always signed. The policy choice is whether or not to verify the
signature, not whether or not the metadata format is signed.


--
Chuck Lever
chuckle...@gmail.com




--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] Promise and ALUA

2020-08-14 Thread McIntyre, Vincent (CASS, Marsfield)
On Thu, Aug 13, 2020 at 10:07:49PM +0200, Xose Vazquez Perez wrote:
>On 8/10/20 6:33 AM, McIntyre, Vincent (CASS, Marsfield) wrote:
>
>>for many years we have been operating some Promise VTrak arrays
>>without any use of the ALUA feature (largely so we don't have to
>>specify LUN affinities as well, which seems to be required).
>>
>>In the process of upgrading to Debian Buster
>>(multipath-tools 0.7.9 and kernel 4.19)
>>I find that I can no longer connect to our Promise arrays.
>>They are detected but the only useful output I get is
>>
>>  multipathd[986]: reconfigure (operator)
>>  multipathd[986]: sdc: alua not supported
>>  multipathd[986]: sdd: alua not supported
>>  multipathd[986]: sdr: alua not supported
>>  multipathd[986]: sde: alua not supported
>>  multipathd[986]: sdf: alua not supported
>>  multipathd[986]: sdg: alua not supported
>>  multipathd[986]: sdh: alua not supported
>>  multipathd[986]: sdi: alua not supported
>>
>>
>>I found the note in the manpage about alua being selected by
>>default for these arrays[1], but I'm taken aback that I'm not
>>allowed to override this.
>>
>>Is there really no support any more for choosing whether to use
>>ALUA or not?
>>
>>I have tried messing about with detect_prio, dectect_checker
>>and whatnot, to no avail.
>
>>[1] 9b5ea2eda85ae072cb697310807611c693713c2b
>> libmultipath: retain_attached_hw_handler obsolete with 4.3+
>
>With the next array config and an empty /etc/multipath.conf,
>reboot the linux host and put the output of "multipath -ll"
>Redundancy Type: Active-Active
>LUN Affinity: Enable
>ALUA: Enable

TL;DR I found a way forward, manually running multipath -a.
Details below, and a suggested tweak to the manpage.

I created a new LUN on the vtrak and mapped it to the test host.
The host sees it, but as expected there's no new multipath yet.
lsscsi does not show a new set of scsi devices either,
which is also expected.

 qla2xxx [:04:00.0]-107ff:1: qla2x00_fcport_event_handler: schedule
 qla2xxx [:04:00.0]-107ff:1: qla_scan_work_fn: schedule loop resync
 qla2xxx [:04:00.0]-280e:1: HBA in F P2P topology.
 qla2xxx [:04:00.0]-2814:1: Configure loop -- dpc flags = 0x60.
 qla2xxx [:04:00.0]-107ff:1: qla2x00_fcport_event_handler: schedule
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 250100015553ce36 pn 
260700015553ce36 portid=1e0260.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 250100015553ce36 pn 
260600015553ce36 portid=1e0280.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 25015553ce36 pn 
260200015553ce36 portid=1e02a0.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 25015553ce36 pn 
260300015553ce36 portid=1e02c0.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 25015553ce36 pn 
260100015553ce36 portid=1e0300.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 25015553ce36 pn 
26015553ce36 portid=1e0320.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 250100015553ce36 pn 
260500015553ce36 portid=1e0340.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 250100015553ce36 pn 
260400015553ce36 portid=1e0360.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 2024ff002350 pn 
2124ff002350 portid=1e0500.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 200200a098b16b5f pn 
203300a098b16b5f portid=1e08a0.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 200200a098b16b5f pn 
203200a098b16b5f portid=1e08a1.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 200200a098b16b5f pn 
206300a098b16b5f portid=1e08c0.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 200200a098b16b5f pn 
204200a098b16b5f portid=1e08c1.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 200200a098b16b5f pn 
204300a098b16b5f portid=1e0900.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 200200a098b16b5f pn 
205200a098b16b5f portid=1e0920.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 200200a098b16b5f pn 
205300a098b16b5f portid=1e0940.
 qla2xxx [:04:00.0]-107ff:1: qla_scan_work_fn: schedule loop resync
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 200200a098b16b5f pn 
206200a098b16b5f portid=1e0960.
 qla2xxx [:04:00.0]-286a:1: qla2x00_configure_loop *** FAILED ***.
 qla2xxx [:04:00.0]-280e:1: HBA in F P2P topology.
 qla2xxx [:04:00.0]-2814:1: Configure loop -- dpc flags = 0x60.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 250100015553ce36 pn 
260700015553ce36 portid=1e0260.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 250100015553ce36 pn 
260600015553ce36 portid=1e0280.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 25015553ce36 pn 
260200015553ce36 portid=1e02a0.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 25015553ce36 pn 
260300015553ce36 portid=1e02c0.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 25015553ce36 pn 
260100015553ce36 portid=1e0300.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 25015553ce36 pn 
26015553ce36 portid=1e0320.
 qla2xxx [:04:00.0]-2858:1: GID_PT entry - nn 250100015553ce36 pn 
260500015553ce36 portid=1e0340.
 

Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)

2020-08-14 Thread Chuck Lever



> On Aug 11, 2020, at 11:32 AM, James Bottomley 
>  wrote:
> 
> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
>>> On Aug 11, 2020, at 1:43 AM, James Bottomley
>>>  wrote:
>>> On Mon, 2020-08-10 at 19:36 -0400, Chuck Lever wrote:
> [...]
 Thanks for the help! I just want to emphasize that documentation
 (eg, a specification) will be critical for remote filesystems.
 
 If any of this is to be supported by a remote filesystem, then we
 need an unencumbered description of the new metadata format
 rather than code. GPL-encumbered formats cannot be contributed to
 the NFS standard, and are probably difficult for other
 filesystems that are not Linux-native, like SMB, as well.
>>> 
>>> I don't understand what you mean by GPL encumbered formats.  The
>>> GPL is a code licence not a data or document licence.
>> 
>> IETF contributions occur under a BSD-style license incompatible
>> with the GPL.
>> 
>> https://trustee.ietf.org/trust-legal-provisions.html
>> 
>> Non-Linux implementers (of OEM storage devices) rely on such
>> standards processes to indemnify them against licensing claims.
> 
> Well, that simply means we won't be contributing the Linux
> implementation, right?

At the present time, there is nothing but the Linux implementation.
There's no English description, there's no specification of the
formats, the format is described only by source code.

The only way to contribute current IMA metadata formats to an open
standards body like the IETF is to look at encumbered code first.
We would effectively be contributing an implementation in this case.

(I'm not saying the current formats should or should not be
contributed; merely that there is a legal stumbling block to doing
so that can be avoided for newly defined formats).


> Well, let me put the counterpoint: I can write a book about how linux
> device drivers work (which includes describing the data formats)


Our position is that someone who reads that book and implements those
formats under a non-GPL-compatible license would be in breach of the
GPL.

The point of the standards process is to indemnify implementing
and distributing under _any_ license what has been published by the
standards body. That legally enables everyone to use the published
protocol/format in their own code no matter how it happens to be
licensed.


> Fine, good grief, people who take a sensible view of this can write the
> data format down and publish it under any licence you like then you can
> pick it up again safely.


That's what I proposed. Write it down under the IETF Trust legal
provisions license. And I volunteered to do that.

All I'm saying is that description needs to come before code.


--
Chuck Lever
chuckle...@gmail.com




--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)

2020-08-14 Thread Chuck Lever



> On Aug 11, 2020, at 11:53 AM, James Bottomley 
>  wrote:
> 
> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
>>> On Aug 11, 2020, at 1:43 AM, James Bottomley >> nPartnership.com> wrote:
>>> 
>>> On Mon, 2020-08-10 at 19:36 -0400, Chuck Lever wrote:
> On Aug 10, 2020, at 11:35 AM, James Bottomley
>  wrote:
> [...]
> The first basic is that a merkle tree allows unit at a time
> verification. First of all we should agree on the unit.  Since
> we always fault a page at a time, I think our merkle tree unit
> should be a page not a block.
 
 Remote filesystems will need to agree that the size of that unit
 is the same everywhere, or the unit size could be stored in the
 per-filemetadata.
 
 
> Next, we should agree where the check gates for the per page
> accesses should be ... definitely somewhere in readpage, I
> suspect and finally we should agree how the merkle tree is
> presented at the gate.  I think there are three ways:
> 
> 1. Ahead of time transfer:  The merkle tree is transferred and
> verified
>at some time before the accesses begin, so we already have
> a
>verified copy and can compare against the lower leaf.
> 2. Async transfer:  We provide an async mechanism to transfer
> the
>necessary components, so when presented with a unit, we
> check the
>log n components required to get to the root
> 3. The protocol actually provides the capability of 2 (like
> the SCSI
>DIF/DIX), so to IMA all the pieces get presented instead of
> IMA
>having to manage the tree
 
 A Merkle tree is potentially large enough that it cannot be
 stored in an extended attribute. In addition, an extended
 attribute is not a byte stream that you can seek into or read
 small parts of, it is retrieved in a single shot.
>>> 
>>> Well you wouldn't store the tree would you, just the head
>>> hash.  The rest of the tree can be derived from the data.  You need
>>> to distinguish between what you *must* have to verify integrity
>>> (the head hash, possibly signed)
>> 
>> We're dealing with an untrusted storage device, and for a remote
>> filesystem, an untrusted network.
>> 
>> Mimi's earlier point is that any IMA metadata format that involves
>> unsigned digests is exposed to an alteration attack at rest or in
>> transit, thus will not provide a robust end-to-end integrity
>> guarantee.
>> 
>> Therefore, tree root digests must be cryptographically signed to be
>> properly protected in these environments. Verifying that signature
>> should be done infrequently relative to reading a file's content.
> 
> I'm not disagreeing there has to be a way for the relying party to
> trust the root hash.
> 
>>> and what is nice to have to speed up the verification
>>> process.  The choice for the latter is cache or reconstruct
>>> depending on the resources available.  If the tree gets cached on
>>> the server, that would be a server implementation detail invisible
>>> to the client.
>> 
>> We assume that storage targets (for block or file) are not trusted.
>> Therefore storage clients cannot rely on intermediate results (eg,
>> middle nodes in a Merkle tree) unless those results are generated
>> within the client's trust envelope.
> 
> Yes, they can ... because supplied nodes can be verified.  That's the
> whole point of a merkle tree.  As long as I'm sure of the root hash I
> can verify all the rest even if supplied by an untrusted source.  If
> you consider a simple merkle tree covering 4 blocks:
> 
>   R
> /   \
>  H11 H12
>  / \ / \
> H21 H22 H23 H24
> ||   |   |
> B1   B2  B3  B4
> 
> Assume I have the verified root hash R.  If you supply B3 you also
> supply H24 and H11 as proof.  I verify by hashing B3 to produce H23
> then hash H23 and H24 to produce H12 and if H12 and your supplied H11
> hash to R the tree is correct and the B3 you supplied must likewise be
> correct.

I'm not sure what you are proving here. Obviously this has to work
in order for a client to reconstruct the file's Merkle tree given
only R and the file content.

It's the construction of the tree and verification of the hashes that
are potentially expensive. The point of caching intermediate hashes
is so that the client verifies them as few times as possible.  I
don't see value in caching those hashes on an untrusted server --
the client will have to reverify them anyway, and there will be no
savings.

Cache once, as close as you can to where the data will be used.


>> So: if the storage target is considered inside the client's trust
>> envelope, it can cache or store durably any intermediate parts of
>> the verification process. If not, the network and file storage is
>> considered untrusted, and the client has to rely on nothing but the
>> signed digest of the tree root.
>> 
>> We could build a scheme around, say, fscache, that might save the
>> intermediate results 

[dm-devel] [PATCH 1/3] IMA: generalize keyring specific measurement constructs

2020-08-14 Thread Tushar Sugandhi
IMA functions such as ima_match_keyring(), process_buffer_measurement(),
ima_match_policy() etc. handle data specific to keyrings. Currently,
these constructs are not generic to handle any func specific data. 
This makes it harder to extend without code duplication.

Refactor the keyring specific measurement constructs to be generic and
reusable in other measurement scenarios. Rename the parameter "size"
in process_buffer_measurement() to "buf_len" to indicate it is the
length of the buffer pointed by the parameter "buf".

Signed-off-by: Tushar Sugandhi 
---
 security/integrity/ima/ima.h| 11 
 security/integrity/ima/ima_api.c|  6 ++---
 security/integrity/ima/ima_main.c   | 17 ++--
 security/integrity/ima/ima_policy.c | 40 +
 4 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 38043074ce5e..e2a151d6653d 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -255,7 +255,7 @@ static inline void ima_process_queued_keys(void) {}
 int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
   int mask, enum ima_hooks func, int *pcr,
   struct ima_template_desc **template_desc,
-  const char *keyring);
+  const char *func_data);
 int ima_must_measure(struct inode *inode, int mask, enum ima_hooks func);
 int ima_collect_measurement(struct integrity_iint_cache *iint,
struct file *file, void *buf, loff_t size,
@@ -265,9 +265,10 @@ void ima_store_measurement(struct integrity_iint_cache 
*iint, struct file *file,
   struct evm_ima_xattr_data *xattr_value,
   int xattr_len, const struct modsig *modsig, int pcr,
   struct ima_template_desc *template_desc);
-void process_buffer_measurement(struct inode *inode, const void *buf, int size,
-   const char *eventname, enum ima_hooks func,
-   int pcr, const char *keyring);
+void process_buffer_measurement(struct inode *inode, const void *buf,
+   int buf_len, const char *eventname,
+   enum ima_hooks func, int pcr,
+   const char *func_data);
 void ima_audit_measurement(struct integrity_iint_cache *iint,
   const unsigned char *filename);
 int ima_alloc_init_template(struct ima_event_data *event_data,
@@ -283,7 +284,7 @@ const char *ima_d_path(const struct path *path, char 
**pathbuf, char *filename);
 int ima_match_policy(struct inode *inode, const struct cred *cred, u32 secid,
 enum ima_hooks func, int mask, int flags, int *pcr,
 struct ima_template_desc **template_desc,
-const char *keyring);
+const char *func_data);
 void ima_init_policy(void);
 void ima_update_policy(void);
 void ima_update_policy_flag(void);
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 4f39fb93f278..af218babd198 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -170,7 +170,7 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
  * @func: caller identifier
  * @pcr: pointer filled in if matched measure policy sets pcr=
  * @template_desc: pointer filled in if matched measure policy sets template=
- * @keyring: keyring name used to determine the action
+ * @func_data: private data specific to @func, can be NULL.
  *
  * The policy is defined in terms of keypairs:
  * subj=, obj=, type=, func=, mask=, fsmagic=
@@ -186,14 +186,14 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
 int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
   int mask, enum ima_hooks func, int *pcr,
   struct ima_template_desc **template_desc,
-  const char *keyring)
+  const char *func_data)
 {
int flags = IMA_MEASURE | IMA_AUDIT | IMA_APPRAISE | IMA_HASH;
 
flags &= ima_policy_flag;
 
return ima_match_policy(inode, cred, secid, func, mask, flags, pcr,
-   template_desc, keyring);
+   template_desc, func_data);
 }
 
 /*
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 8a91711ca79b..a8740b7ea417 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -728,17 +728,18 @@ int ima_load_data(enum kernel_load_data_id id)
  * process_buffer_measurement - Measure the buffer to ima log.
  * @inode: inode associated with the object being measured (NULL for KEY_CHECK)
  * @buf: pointer to the buffer that needs to be added to the log.
- * @size: size of buffer(in bytes).
+ * @buf_len: length 

[dm-devel] [PATCH 3/3] IMA: define IMA hook to measure critical data from kernel components

2020-08-14 Thread Tushar Sugandhi
Currently, IMA does not provide a generic function to kernel components
to measure their data. A generic function provided by IMA would
enable various parts of the kernel with easier and faster on-boarding to
use IMA infrastructure, would avoid code duplication, and consistent
usage of IMA policy CRITICAL_DATA+data_sources across the kernel.

Define a generic IMA function ima_measure_critical_data() to measure
data from various kernel components. Limit the measurement to the
components that are specified in the IMA policy - 
CRITICAL_DATA+data_sources.
Update process_buffer_measurement() to return the status code of the
operation.

Signed-off-by: Tushar Sugandhi 
---
 include/linux/ima.h   |  9 
 security/integrity/ima/ima.h  |  8 +++
 security/integrity/ima/ima_main.c | 37 ---
 3 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index d15100de6cdd..865332ecedcb 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -26,6 +26,9 @@ extern int ima_post_read_file(struct file *file, void *buf, 
loff_t size,
 extern void ima_post_path_mknod(struct dentry *dentry);
 extern int ima_file_hash(struct file *file, char *buf, size_t buf_size);
 extern void ima_kexec_cmdline(int kernel_fd, const void *buf, int size);
+extern int ima_measure_critical_data(const char *event_name,
+const char *event_data_source,
+const void *buf, int buf_len);
 
 #ifdef CONFIG_IMA_KEXEC
 extern void ima_add_kexec_buffer(struct kimage *image);
@@ -104,6 +107,12 @@ static inline int ima_file_hash(struct file *file, char 
*buf, size_t buf_size)
 }
 
 static inline void ima_kexec_cmdline(int kernel_fd, const void *buf, int size) 
{}
+static inline int ima_measure_critical_data(const char *event_name,
+   const char *event_data_source,
+   const void *buf, int buf_len)
+{
+   return -EOPNOTSUPP;
+}
 #endif /* CONFIG_IMA */
 
 #ifndef CONFIG_IMA_KEXEC
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 99773dfa2541..e65ab067e700 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -266,10 +266,10 @@ void ima_store_measurement(struct integrity_iint_cache 
*iint, struct file *file,
   struct evm_ima_xattr_data *xattr_value,
   int xattr_len, const struct modsig *modsig, int pcr,
   struct ima_template_desc *template_desc);
-void process_buffer_measurement(struct inode *inode, const void *buf,
-   int buf_len, const char *eventname,
-   enum ima_hooks func, int pcr,
-   const char *func_data);
+int process_buffer_measurement(struct inode *inode, const void *buf,
+  int buf_len, const char *eventname,
+  enum ima_hooks func, int pcr,
+  const char *func_data);
 void ima_audit_measurement(struct integrity_iint_cache *iint,
   const unsigned char *filename);
 int ima_alloc_init_template(struct ima_event_data *event_data,
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index a8740b7ea417..129bcaaf13e2 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -736,10 +736,11 @@ int ima_load_data(enum kernel_load_data_id id)
  *
  * Based on policy, the buffer is measured into the ima log.
  */
-void process_buffer_measurement(struct inode *inode, const void *buf,
-   int buf_len, const char *eventname,
-   enum ima_hooks func, int pcr,
-   const char *func_data)
+
+int process_buffer_measurement(struct inode *inode, const void *buf,
+  int buf_len, const char *eventname,
+  enum ima_hooks func, int pcr,
+  const char *func_data)
 {
int ret = 0;
const char *audit_cause = "ENOMEM";
@@ -759,7 +760,7 @@ void process_buffer_measurement(struct inode *inode, const 
void *buf,
u32 secid;
 
if (!ima_policy_flag)
-   return;
+   return 0;
 
/*
 * Both LSM hooks and auxilary based buffer measurements are
@@ -773,7 +774,7 @@ void process_buffer_measurement(struct inode *inode, const 
void *buf,
action = ima_get_action(inode, current_cred(), secid, 0, func,
, , func_data);
if (!(action & IMA_MEASURE))
-   return;
+   return 0;
}
 
if (!pcr)
@@ -788,7 +789,7 @@ void process_buffer_measurement(struct inode *inode, const 
void *buf,

Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)

2020-08-14 Thread Chuck Lever



> On Aug 11, 2020, at 2:28 PM, James Bottomley 
>  wrote:
> 
> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
>> Mimi's earlier point is that any IMA metadata format that involves
>> unsigned digests is exposed to an alteration attack at rest or in
>> transit, thus will not provide a robust end-to-end integrity
>> guarantee.
> 
> I don't believe that is Mimi's point, because it's mostly not correct:
> the xattr mechanism does provide this today.  The point is the
> mechanism we use for storing IMA hashes and signatures today is xattrs
> because they have robust security properties for local filesystems that
> the kernel enforces.  This use goes beyond IMA, selinux labels for
> instance use this property as well.

I don't buy this for a second. If storing a security label in a
local xattr is so secure, we wouldn't have any need for EVM.


> What I think you're saying is that NFS can't provide the robust
> security for xattrs we've been relying on, so you need some other
> mechanism for storing them.

For NFS, there's a network traversal which is an attack surface.

A local xattr can be attacked as well: a device or bus malfunction
can corrupt the content of an xattr, or a privileged user can modify
it.

How does that metadata get from the software provider to the end
user? It's got to go over a network, stored in various ways, some
of which will not be trusted. To attain an unbroken chain of
provenance, that metadata has to be signed.

I don't think the question is the storage mechanism, but rather the
protection mechanism. Signing the metadata protects it in all of
these cases.


> I think Mimi's other point is actually that IMA uses a flat hash which
> we derive by reading the entire file and then watching for mutations. 
> Since you cannot guarantee we get notice of mutation with NFS, the
> entire IMA mechanism can't really be applied in its current form and we
> have to resort to chunk at a time verifications that a Merkel tree
> would provide.

I'm not sure what you mean by this. An NFS client relies on notification
of mutation to maintain the integrity of its cache of NFS file content,
and it's done that since the 1980s.

In addition to examining a file's mtime and ctime as maintained by
the NFS server, a client can rely on the file's NFSv4 change attribute
or an NFSv4 delegation.


> Doesn't this make moot any thinking about
> standardisation in NFS for the current IMA flat hash mechanism because
> we simply can't use it ... If I were to construct a prototype I'd have
> to work out and securely cache the hash of ever chunk when verifying
> the flat hash so I could recheck on every chunk read.  I think that's
> infeasible for large files.
> 
> James
> 

--
Chuck Lever
chuckle...@gmail.com




--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 2/3] IMA: add policy to support measuring critical data from kernel components

2020-08-14 Thread Tushar Sugandhi
There would be several candidate kernel components suitable for IMA
measurement. Not all of them would be enlightened for IMA measurement.
Also, system administrators may not want to measure data for all of
them, even when they are enlightened for IMA measurements. An IMA policy
specific to various kernel components is needed to measure their
respective critical data.

Add a new IMA policy CRITICAL_DATA+data_sources to support measuring
various critical kernel components. This policy would enable the
system administrators to limit the measurement to the components,
if the components are enlightened for IMA measurement.

Signed-off-by: Tushar Sugandhi 
---
 Documentation/ABI/testing/ima_policy |  6 +-
 security/integrity/ima/ima.h |  1 +
 security/integrity/ima/ima_api.c |  2 +-
 security/integrity/ima/ima_policy.c  | 84 ++--
 4 files changed, 74 insertions(+), 19 deletions(-)

diff --git a/Documentation/ABI/testing/ima_policy 
b/Documentation/ABI/testing/ima_policy
index cd572912c593..a0dd0f108555 100644
--- a/Documentation/ABI/testing/ima_policy
+++ b/Documentation/ABI/testing/ima_policy
@@ -29,7 +29,7 @@ Description:
base:   func:= 
[BPRM_CHECK][MMAP_CHECK][CREDS_CHECK][FILE_CHECK][MODULE_CHECK]
[FIRMWARE_CHECK]
[KEXEC_KERNEL_CHECK] [KEXEC_INITRAMFS_CHECK]
-   [KEXEC_CMDLINE] [KEY_CHECK]
+   [KEXEC_CMDLINE] [KEY_CHECK] [CRITICAL_DATA]
mask:= [[^]MAY_READ] [[^]MAY_WRITE] [[^]MAY_APPEND]
   [[^]MAY_EXEC]
fsmagic:= hex value
@@ -125,3 +125,7 @@ Description:
keys added to .builtin_trusted_keys or .ima keyring:
 
measure func=KEY_CHECK 
keyrings=.builtin_trusted_keys|.ima
+
+   Example of measure rule using CRITICAL_DATA to measure critical 
data
+
+   measure func=CRITICAL_DATA 
data_sources=selinux|apparmor|dm-crypt
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e2a151d6653d..99773dfa2541 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -200,6 +200,7 @@ static inline unsigned int ima_hash_key(u8 *digest)
hook(POLICY_CHECK, policy)  \
hook(KEXEC_CMDLINE, kexec_cmdline)  \
hook(KEY_CHECK, key)\
+   hook(CRITICAL_DATA, critical_data)  \
hook(MAX_CHECK, none)
 
 #define __ima_hook_enumify(ENUM, str)  ENUM,
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index af218babd198..9917e1730cb6 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -176,7 +176,7 @@ void ima_add_violation(struct file *file, const unsigned 
char *filename,
  * subj=, obj=, type=, func=, mask=, fsmagic=
  * subj,obj, and type: are LSM specific.
  * func: FILE_CHECK | BPRM_CHECK | CREDS_CHECK | MMAP_CHECK | MODULE_CHECK
- * | KEXEC_CMDLINE | KEY_CHECK
+ * | KEXEC_CMDLINE | KEY_CHECK | CRITICAL_DATA
  * mask: contains the permission mask
  * fsmagic: hex value
  *
diff --git a/security/integrity/ima/ima_policy.c 
b/security/integrity/ima/ima_policy.c
index 4efaf8956eb8..8451ccb2a775 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -22,17 +22,18 @@
 #include "ima.h"
 
 /* flags definitions */
-#define IMA_FUNC   0x0001
-#define IMA_MASK   0x0002
-#define IMA_FSMAGIC0x0004
-#define IMA_UID0x0008
-#define IMA_FOWNER 0x0010
-#define IMA_FSUUID 0x0020
-#define IMA_INMASK 0x0040
-#define IMA_EUID   0x0080
-#define IMA_PCR0x0100
-#define IMA_FSNAME 0x0200
-#define IMA_KEYRINGS   0x0400
+#define IMA_FUNC   0x0001
+#define IMA_MASK   0x0002
+#define IMA_FSMAGIC0x0004
+#define IMA_UID0x0008
+#define IMA_FOWNER 0x0010
+#define IMA_FSUUID 0x0020
+#define IMA_INMASK 0x0040
+#define IMA_EUID   0x0080
+#define IMA_PCR0x0100
+#define IMA_FSNAME 0x0200
+#define IMA_KEYRINGS   0x0400
+#define IMA_DATA_SOURCES   0x0800
 
 #define UNKNOWN0
 #define MEASURE0x0001  /* same as IMA_MEASURE */
@@ -84,6 +85,7 @@ struct ima_rule_entry {
} lsm[MAX_LSM_RULES];
char *fsname;
struct ima_rule_opt_list *keyrings; /* Measure keys added to these 
keyrings */
+   struct ima_rule_opt_list *data_sources; /* Measure data from these 
sources */
struct ima_template_desc *template;
 };
 
@@ -508,14 +510,23 @@ static bool ima_match_rules(struct ima_rule_entry *rule, 
struct inode *inode,
 {
int i;
 
-   if (func == KEY_CHECK) {
-   return (rule->flags 

Re: [dm-devel] Promise and ALUA

2020-08-14 Thread McIntyre, Vincent (CASS, Marsfield)
On Fri, Aug 14, 2020 at 02:24:08AM +0200, Xose Vazquez Perez wrote:
>
>If E830f is an Active-Active array without ALUA support,
>add to /etc/multipath.conf:
>
>devices {
>device {
>vendor "Promise"
>product "VTrak"
>product_blacklist "VTrak V-LUN"
>path_grouping_policy "multibus"
>detect_prio "no"
>hardware_handler ""
>prio "const"
>failback "manual"
>no_path_retry 30
>}
>}
>
>run update-initramfs, and reboot.

I updated multipath.conf as above, ran update-initramfs.
# cat /etc/multipath.conf
devices {
device {
vendor "Promise"
product "VTrak"
product_blacklist "VTrak V-LUN"
path_grouping_policy "multibus"
detect_prio "no"
hardware_handler ""
prio "const"
failback "manual"
no_path_retry 30
}
}

Also, I removed the wwid of the test LUN

# multipath -v 3 -w 2228a0001558b1855
Aug 14 11:52:54 | set open fds limit to 1048576/1048576
Aug 14 11:52:54 | loading //lib/multipath/libchecktur.so checker
Aug 14 11:52:54 | checker tur: message table size = 3
Aug 14 11:52:54 | loading //lib/multipath/libprioconst.so prioritizer
Aug 14 11:52:54 | foreign library "nvme" loaded successfully
Aug 14 11:52:54 | libdevmapper version 1.02.155 (2018-12-18)
Aug 14 11:52:54 | DM multipath kernel driver v1.13.0
Aug 14 11:52:54 | removing line '/2228a0001558b1855/
' from wwids file
Aug 14 11:52:54 | found '/2228a0001558b1855/
'
wwid '2228a0001558b1855' removed
Aug 14 11:52:54 | unloading const prioritizer
Aug 14 11:52:54 | unloading tur checker

# cat /etc/multipath/wwids
# Multipath wwids, Version : 1.0
# NOTE: This file is automatically maintained by multipath and multipathd.
# You should not need to edit this file in normal circumstances.
#
# Valid WWIDs:
/2221f000155c0792e/
/3600a098000b173f6079e5da82d73/
/222e30001469c/
#2228a0001558b1855/

# multipath -F
# multipath -l
(no output)
# multipath -r
# multpath -l |grep dm-
3600a098000b173f6079e5da82d73 dm-10 DELL,MD38xxf

# reboot

# multipath -ll
3600a098000b173f6079e5da82d73 dm-10 DELL,MD38xxf
size=40T features='5 queue_if_no_path pg_init_retries 50 queue_mode mq' 
hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=14 status=active
| |- 1:0:10:0  sdaa 65:160 active ready running
| |- 1:0:12:0  sdac 65:192 active ready running
| |- 1:0:14:0  sdae 65:224 active ready running
| `- 1:0:8:0   sdy  65:128 active ready running
`-+- policy='service-time 0' prio=9 status=enabled
  |- 1:0:11:0  sdab 65:176 active ready running
  |- 1:0:13:0  sdad 65:208 active ready running
  |- 1:0:7:0   sdx  65:112 active ready running
  `- 1:0:9:0   sdz  65:144 active ready running

#  multipath -T
defaults {
verbosity 2
polling_interval 5
max_polling_interval 20
reassign_maps "no"
multipath_dir "//lib/multipath"
path_selector "service-time 0"
path_grouping_policy "failover"
uid_attribute "ID_SERIAL"
prio "const"
prio_args ""
features "0"
path_checker "tur"
alias_prefix "mpath"
failback "manual"
rr_min_io 1000
rr_min_io_rq 1
max_fds "max"
rr_weight "uniform"
queue_without_daemon "no"
flush_on_last_del "no"
user_friendly_names "no"
fast_io_fail_tmo 5
bindings_file "/etc/multipath/bindings"
wwids_file "/etc/multipath/wwids"
prkeys_file "/etc/multipath/prkeys"
log_checker_err always
all_tg_pt "no"
retain_attached_hw_handler "yes"
detect_prio "yes"
detect_checker "yes"
force_sync "no"
strict_timing "no"
deferred_remove "no"
config_dir "/etc/multipath/conf.d"
delay_watch_checks "no"
delay_wait_checks "no"
marginal_path_err_sample_time "no"
marginal_path_err_rate_threshold "no"
marginal_path_err_recheck_gap_time "no"
marginal_path_double_failed_time "no"
find_multipaths "strict"
uxsock_timeout 4000
retrigger_tries 0
retrigger_delay 10
missing_uev_wait_timeout 30
skip_kpartx "no"
disable_changed_wwids "yes"
remove_retries 0
ghost_delay "no"
find_multipaths_timeout -10
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|dcssblk)[0-9]"
devnode "^(td|hd|vd)[a-z]"
devnode "^cciss!c[0-9]d[0-9]*"
device {
vendor "SGI"
product "Universal Xport"
}
device {
vendor "^DGC"
product "LUNZ"
}
device {
vendor "EMC"
product "LUNZ"
}
device {
vendor "DELL"
product "Universal Xport"
 

Re: [dm-devel] [RESEND PATCH] nvme: explicitly use normal NVMe error handling when appropriate

2020-08-14 Thread Meneghini, John
That works Mike. 

Unfortunately, I have to work with the corporate MS office email handler (which 
sucks)  and downloading the .patch file works much better for me.

Thanks,

/John

On 8/13/20, 11:43 AM, "Mike Snitzer"  wrote:

Maybe because I didn't cc linux-block?
Only way that I know to "upload this patch there" is to have cc'd
linux-block.

But the patch is in dm-devel's patchwork here:
https://patchwork.kernel.org/patch/11712563/

Is that sufficient for your needs?

Thanks,
Mike

On Thu, Aug 13 2020 at 11:29am -0400,
Meneghini, John  wrote:

> Mike,
>
> I don't see your patch at:
>
> https://patchwork.kernel.org/project/linux-block/list/?submitter=1643
>
> Can you please upload this patch there?
>
> Thanks,
>
> /John
>
> On 8/13/20, 10:48 AM, "Mike Snitzer"  wrote:
>
> Commit 764e9332098c0 ("nvme-multipath: do not reset on unknown
> status"), among other things, fixed NVME_SC_CMD_INTERRUPTED error
> handling by changing multipathing's nvme_failover_req() to 
short-circuit
> path failover and then fallback to NVMe's normal error handling (which
> takes care of NVME_SC_CMD_INTERRUPTED).
>
> This detour through native NVMe multipathing code is unwelcome because
> it prevents NVMe core from handling NVME_SC_CMD_INTERRUPTED 
independent
> of any multipathing concerns.
>
> Introduce nvme_status_needs_local_error_handling() to prioritize
> non-failover retry, when appropriate, in terms of normal NVMe error
> handling.  nvme_status_needs_local_error_handling() will naturely 
evolve
> to include handling of any other errors that normal error handling 
must
> be used for.
>
> nvme_failover_req()'s ability to fallback to normal NVMe error 
handling
> has been preserved because it may be useful for future NVME_SC that
> nvme_status_needs_local_error_handling() hasn't been trained for yet.
>
> Signed-off-by: Mike Snitzer 
> ---
>  drivers/nvme/host/core.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 88cff309d8e4..be749b690af7 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -252,6 +252,16 @@ static inline bool nvme_req_needs_retry(struct 
request *req)
> return true;
>  }
>
> +static inline bool nvme_status_needs_local_error_handling(u16 status)
> +{
> +   switch (status & 0x7ff) {
> +   case NVME_SC_CMD_INTERRUPTED:
> +   return true;
> +   default:
> +   return false;
> +   }
> +}
> +
>  static void nvme_retry_req(struct request *req)
>  {
> struct nvme_ns *ns = req->q->queuedata;
> @@ -270,7 +280,8 @@ static void nvme_retry_req(struct request *req)
>
>  void nvme_complete_rq(struct request *req)
>  {
> -   blk_status_t status = 
nvme_error_status(nvme_req(req)->status);
> +   u16 nvme_status = nvme_req(req)->status;
> +   blk_status_t status = nvme_error_status(nvme_status);
>
> trace_nvme_complete_rq(req);
>
> @@ -280,7 +291,8 @@ void nvme_complete_rq(struct request *req)
> nvme_req(req)->ctrl->comp_seen = true;
>
> if (unlikely(status != BLK_STS_OK && 
nvme_req_needs_retry(req))) {
> -   if ((req->cmd_flags & REQ_NVME_MPATH) && 
nvme_failover_req(req))
> +   if 
(!nvme_status_needs_local_error_handling(nvme_status) &&
> +   (req->cmd_flags & REQ_NVME_MPATH) && 
nvme_failover_req(req))
> return;
>
> if (!blk_queue_dying(req->q)) {
> --
> 2.18.0
>
>



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)

2020-08-14 Thread Chuck Lever



> On Aug 11, 2020, at 5:03 PM, James Morris  wrote:
> 
> On Sat, 8 Aug 2020, Chuck Lever wrote:
> 
>> My interest is in code integrity enforcement for executables stored
>> in NFS files.
>> 
>> My struggle with IPE is that due to its dependence on dm-verity, it
>> does not seem to able to protect content that is stored separately
>> from its execution environment and accessed via a file access
>> protocol (FUSE, SMB, NFS, etc).
> 
> It's not dependent on DM-Verity, that's just one possible integrity 
> verification mechanism, and one of two supported in this initial 
> version. The other is 'boot_verified' for a verified or otherwise trusted 
> rootfs. Future versions will support FS-Verity, at least.
> 
> IPE was designed to be extensible in this way, with a strong separation of 
> mechanism and policy.

I got that, but it looked to me like the whole system relied on having
access to the block device under the filesystem. That's not possible
for a remote filesystem like Ceph or NFS.

I'm happy to take a closer look if someone can point me the right way.


--
Chuck Lever
chuckle...@gmail.com



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [RESEND PATCH] nvme: explicitly use normal NVMe error handling when appropriate

2020-08-14 Thread Meneghini, John
On 8/13/20, 2:44 PM, "Christoph Hellwig"  wrote:

On Thu, Aug 13, 2020 at 01:47:04PM -0400, Mike Snitzer wrote:
> This is just a tweak to improve the high-level fault tree of core NVMe
> error handling.  No functional change, but for such basic errors,
> avoiding entering nvme_failover_req is meaningful on a code flow level.
> Makes code to handle errors that need local retry clearer by being more
> structured, less circuitous.
  
I don't understand how entering nvme_failover_req() is circuitous. 

This code path is only taken if REQ_NVME_MPATH is set which - unless I am 
mistaken - in
the case that you care about it will not be set.

> Allows NVMe core's handling of such errors to be more explicit and live
> in core.c rather than multipath.c -- so things like ACRE handling can be
> made explicitly part of core and not nested under nvme_failover_req's
> relatively obscure failsafe that returns false for anything it doesn't
> care about.

The ACRE handling is already explicitly a part of the core.  I don't understand 
what
you are after here Mike.  Are you saying that you don't want the ACRE code to 
run
when REQ_NVME_MPATH is clear?

If we're going that way I'd rather do something like the (untested)
patch below that adds a dispostion function with a function that
decides it and then just switches on it:

Christoph, it looks like you've moved a lot of stuff around here, with no actual
functional change but it's really hard for me to tell.  Please be sure to 
cc me if this
becomes a real patch.

How does your patch solve the problem of making dm-multipath work with command 
retries?

Mike, do you want the nvme-core driver to retry commands on the same path, with 
CRD, for the dm-multipath
use case... or are you looking for a different treatment of REQ_FAILFAST_DEV... 
or what? 

Maybe I'm not seeing it.

/John

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 88cff309d8e4f0..a740320f0d4ee7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -241,17 +241,6 @@ static blk_status_t nvme_error_status(u16 status)
}
 }

-static inline bool nvme_req_needs_retry(struct request *req)
-{
-   if (blk_noretry_request(req))
-   return false;
-   if (nvme_req(req)->status & NVME_SC_DNR)
-   return false;
-   if (nvme_req(req)->retries >= nvme_max_retries)
-   return false;
-   return true;
-}
-
 static void nvme_retry_req(struct request *req)
 {
struct nvme_ns *ns = req->q->queuedata;
@@ -268,33 +257,75 @@ static void nvme_retry_req(struct request *req)
blk_mq_delay_kick_requeue_list(req->q, delay);
 }

-void nvme_complete_rq(struct request *req)
+enum nvme_disposition {
+   COMPLETE,
+   RETRY,
+   REDIRECT_ANA,
+   REDIRECT_TMP,
+};
+
+static inline enum nvme_disposition nvme_req_disposition(struct request 
*req)
+{
+   if (likely(nvme_req(req)->status == 0))
+   return COMPLETE;
+
+   if (blk_noretry_request(req) ||
+   (nvme_req(req)->status & NVME_SC_DNR) ||
+   nvme_req(req)->retries >= nvme_max_retries)
+   return COMPLETE;
+
+   if (req->cmd_flags & REQ_NVME_MPATH) {
+   switch (nvme_req(req)->status & 0x7ff) {
+   case NVME_SC_ANA_TRANSITION:
+   case NVME_SC_ANA_INACCESSIBLE:
+   case NVME_SC_ANA_PERSISTENT_LOSS:
+   return REDIRECT_ANA;
+   case NVME_SC_HOST_PATH_ERROR:
+   case NVME_SC_HOST_ABORTED_CMD:
+   return REDIRECT_TMP;
+   }
+   }
+
+   if (blk_queue_dying(req->q))
+   return COMPLETE;
+   return RETRY;
+}
+
+static inline void nvme_complete_req(struct request *req)
 {
blk_status_t status = nvme_error_status(nvme_req(req)->status);

-   trace_nvme_complete_rq(req);
+   if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
+   req_op(req) == REQ_OP_ZONE_APPEND)
+   req->__sector = nvme_lba_to_sect(req->q->queuedata,
+   le64_to_cpu(nvme_req(req)->result.u64));
+
+   nvme_trace_bio_complete(req, status);
+   blk_mq_end_request(req, status);
+}

+void nvme_complete_rq(struct request *req)
+{
+   trace_nvme_complete_rq(req);
nvme_cleanup_cmd(req);

if (nvme_req(req)->ctrl->kas)
nvme_req(req)->ctrl->comp_seen = true;

-   if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
-   if ((req->cmd_flags & REQ_NVME_MPATH) && 
nvme_failover_req(req))
-   return;
-
-   

Re: [dm-devel] [RFC PATCH v5 00/11] Integrity Policy Enforcement LSM (IPE)

2020-08-14 Thread Chuck Lever



> On Aug 12, 2020, at 11:42 AM, James Bottomley 
>  wrote:
> 
> On Wed, 2020-08-12 at 09:56 -0400, Chuck Lever wrote:
>>> On Aug 11, 2020, at 2:28 PM, James Bottomley >> nPartnership.com> wrote:
>>> 
>>> On Tue, 2020-08-11 at 10:48 -0400, Chuck Lever wrote:
 Mimi's earlier point is that any IMA metadata format that
 involves unsigned digests is exposed to an alteration attack at
 rest or in transit, thus will not provide a robust end-to-end
 integrity guarantee.
>>> 
>>> I don't believe that is Mimi's point, because it's mostly not
>>> correct: the xattr mechanism does provide this today.  The point is
>>> the mechanism we use for storing IMA hashes and signatures today is
>>> xattrs because they have robust security properties for local
>>> filesystems that the kernel enforces.  This use goes beyond IMA,
>>> selinux labels for instance use this property as well.
>> 
>> I don't buy this for a second. If storing a security label in a
>> local xattr is so secure, we wouldn't have any need for EVM.
> 
> What don't you buy?  Security xattrs can only be updated by local root.
> If you trust local root, the xattr mechanism is fine ... it's the only
> one a lot of LSMs use, for instance.  If you don't trust local root or
> worry about offline backups, you use EVM.  A thing isn't secure or
> insecure, it depends on the threat model.  However, if you don't trust
> the NFS server it doesn't matter whether you do or don't trust local
> root, you can't believe the contents of the xattr.
> 
>>> What I think you're saying is that NFS can't provide the robust
>>> security for xattrs we've been relying on, so you need some other
>>> mechanism for storing them.
>> 
>> For NFS, there's a network traversal which is an attack surface.
>> 
>> A local xattr can be attacked as well: a device or bus malfunction
>> can corrupt the content of an xattr, or a privileged user can modify
>> it.
>> 
>> How does that metadata get from the software provider to the end
>> user? It's got to go over a network, stored in various ways, some
>> of which will not be trusted. To attain an unbroken chain of
>> provenance, that metadata has to be signed.
>> 
>> I don't think the question is the storage mechanism, but rather the
>> protection mechanism. Signing the metadata protects it in all of
>> these cases.
> 
> I think we're saying about the same thing.

Roughly.


> For most people the
> security mechanism of local xattrs is sufficient.  If you're paranoid,
> you don't believe it is and you use EVM.

When IMA metadata happens to be stored in local filesystems in
a trusted xattr, it's going to enjoy the protection you describe
without needing the addition of a cryptographic signature.

However, that metadata doesn't live its whole life there. It
can reside in a tar file, it can cross a network, it can live
on a back-up tape. I think we agree that any time that metadata
is in transit or at rest outside of a Linux local filesystem, it
is exposed.

Thus I'm interested in a metadata protection mechanism that does
not rely on the security characteristics of a particular storage
container. For me, a cryptographic signature fits that bill
nicely.


>>> I think Mimi's other point is actually that IMA uses a flat hash
>>> which we derive by reading the entire file and then watching for
>>> mutations. Since you cannot guarantee we get notice of mutation
>>> with NFS, the entire IMA mechanism can't really be applied in its
>>> current form and we have to resort to chunk at a time verifications
>>> that a Merkel tree would provide.
>> 
>> I'm not sure what you mean by this. An NFS client relies on
>> notification of mutation to maintain the integrity of its cache of
>> NFS file content, and it's done that since the 1980s.
> 
> Mutation detection is part of the current IMA security model.  If IMA
> sees a file mutate it has to be rehashed the next time it passes the
> gate.  If we can't trust the NFS server, we can't trust the NFS
> mutation notification and we have to have a different mechanism to
> check the file.

When an NFS server lies about mtime and ctime, then NFS is completely
broken. Untrusted NFS server doesn't mean "broken behavior" -- I
would think that local filesystems will have the same problem if
they can't trust a local block device to store filesystem metadata
like indirect blocks and timestamps.

It's not clear to me that IMA as currently implemented can protect
against broken storage devices or incorrect filesystem behavior.


>> In addition to examining a file's mtime and ctime as maintained by
>> the NFS server, a client can rely on the file's NFSv4 change
>> attribute or an NFSv4 delegation.
> 
> And that's secure in the face of a malicious or compromised server?
> 
> The bottom line is still, I think we can't use linear hashes with an
> open/exec/mmap gate with NFS and we have to move to chunk at a time
> verification like that provided by a merkel tree.

That's fine until we claim that remote filesystems require 

Re: [dm-devel] [RESEND PATCH] nvme: explicitly use normal NVMe error handling when appropriate

2020-08-14 Thread Sagi Grimberg




+static inline enum nvme_disposition nvme_req_disposition(struct request *req)
+{
+   if (likely(nvme_req(req)->status == 0))
+   return COMPLETE;
+
+   if (blk_noretry_request(req) ||
+   (nvme_req(req)->status & NVME_SC_DNR) ||
+   nvme_req(req)->retries >= nvme_max_retries)
+   return COMPLETE;
+
+   if (req->cmd_flags & REQ_NVME_MPATH) {
+   switch (nvme_req(req)->status & 0x7ff) {
+   case NVME_SC_ANA_TRANSITION:
+   case NVME_SC_ANA_INACCESSIBLE:
+   case NVME_SC_ANA_PERSISTENT_LOSS:
+   return REDIRECT_ANA;
+   case NVME_SC_HOST_PATH_ERROR:
+   case NVME_SC_HOST_ABORTED_CMD:
+   return REDIRECT_TMP;
+   }
+   }
+
+   if (blk_queue_dying(req->q))
+   return COMPLETE;
+   return RETRY;
+}
+
+static inline void nvme_complete_req(struct request *req)
  {
blk_status_t status = nvme_error_status(nvme_req(req)->status);
  
-	trace_nvme_complete_rq(req);

+   if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
+   req_op(req) == REQ_OP_ZONE_APPEND)
+   req->__sector = nvme_lba_to_sect(req->q->queuedata,
+   le64_to_cpu(nvme_req(req)->result.u64));
+
+   nvme_trace_bio_complete(req, status);
+   blk_mq_end_request(req, status);
+}
  
+void nvme_complete_rq(struct request *req)

+{
+   trace_nvme_complete_rq(req);
nvme_cleanup_cmd(req);
  
  	if (nvme_req(req)->ctrl->kas)

nvme_req(req)->ctrl->comp_seen = true;
  
-	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {

-   if ((req->cmd_flags & REQ_NVME_MPATH) && nvme_failover_req(req))
-   return;
-
-   if (!blk_queue_dying(req->q)) {
-   nvme_retry_req(req);
-   return;
-   }
-   } else if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
-  req_op(req) == REQ_OP_ZONE_APPEND) {
-   req->__sector = nvme_lba_to_sect(req->q->queuedata,
-   le64_to_cpu(nvme_req(req)->result.u64));
+   switch (nvme_req_disposition(req)) {
+   case COMPLETE:
+   nvme_complete_req(req);


nvme_complete_rq calling nvme_complete_req... Maybe call it
__nvme_complete_rq instead?


+   return;
+   case RETRY:
+   nvme_retry_req(req);
+   return;
+   case REDIRECT_ANA:
+   nvme_failover_req(req, true);
+   return;
+   case REDIRECT_TMP:
+   nvme_failover_req(req, false);
+   return;
}
-
-   nvme_trace_bio_complete(req, status);
-   blk_mq_end_request(req, status);
  }
  EXPORT_SYMBOL_GPL(nvme_complete_rq);
  
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c

index 3ded54d2c9c6ad..0c22b2c88687a2 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -65,51 +65,32 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
}
  }
  
-bool nvme_failover_req(struct request *req)

+void nvme_failover_req(struct request *req, bool is_ana_status)
  {
struct nvme_ns *ns = req->q->queuedata;
-   u16 status = nvme_req(req)->status;
unsigned long flags;
  
-	switch (status & 0x7ff) {

-   case NVME_SC_ANA_TRANSITION:
-   case NVME_SC_ANA_INACCESSIBLE:
-   case NVME_SC_ANA_PERSISTENT_LOSS:
-   /*
-* If we got back an ANA error we know the controller is alive,
-* but not ready to serve this namespaces.  The spec suggests
-* we should update our general state here, but due to the fact
-* that the admin and I/O queues are not serialized that is
-* fundamentally racy.  So instead just clear the current path,
-* mark the the path as pending and kick of a re-read of the ANA
-* log page ASAP.
-*/
-   nvme_mpath_clear_current_path(ns);
-   if (ns->ctrl->ana_log_buf) {
-   set_bit(NVME_NS_ANA_PENDING, >flags);
-   queue_work(nvme_wq, >ctrl->ana_work);
-   }
-   break;
-   case NVME_SC_HOST_PATH_ERROR:
-   case NVME_SC_HOST_ABORTED_CMD:
-   /*
-* Temporary transport disruption in talking to the controller.
-* Try to send on a new path.
-*/
-   nvme_mpath_clear_current_path(ns);
-   break;
-   default:
-   /* This was a non-ANA error so follow the normal error path. */
-   return false;
+   nvme_mpath_clear_current_path(ns);
+
+   /*
+* If we got back an ANA error we know the controller is alive, but not
+* ready to serve this namespaces.  The spec suggests we should update
+* our general