Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface

2017-01-21 Thread Wouter Verhelst
On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
> This patch mirrors the loop back device behavior with a few changes.  First
> there is no DEL operation as NBD doesn't get as much churn as loop devices do.
> Secondly the GET_NEXT operation can optionally create a new NBD device or not.
> Our infrastructure people want to not allow NBD to create new devices as it
> causes problems for them in containers.  However allow this to be optional as
> things like the OSS NBD client probably doesn't care and would like to just be
> given a device to use.

Don't be so sure :-)

I agree that having a control device for NBD is useful and would make
certain things much easier. If that's added, then I'll move to using
that as a way to control the device rather than opening a device and
dealing with it that way. In fact, at some point in the past I did
suggest something along those ways myself; it's just not happened yet.

Obviously though this would require an intermediate situation in which
the control master would be available as well as (optionally perhaps)
the old way where you open a specific device node, so that we don't
break existing implementations before they've had a chance to follow
suit.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface

2017-01-25 Thread Wouter Verhelst
On Tue, Jan 24, 2017 at 08:11:52AM +0100, Greg KH wrote:
> On Mon, Jan 23, 2017 at 10:52:42AM -0500, Josef Bacik wrote:
> > I explained it in the changelog and my response to Wouter.  NBD preallocates
> > all of its /dev/nbd# devices at modprobe time, so there's no way to add new
> > devices as we need them.
> 
> Why not fix that odd restriction?

Isn't that what this patch is trying to do?

> > Loop accomplishes this with the /dev/loop-control
> > and an ioctl.  Then we also need a way to figure out what is the first
> > /dev/nbd# device that isn't currently in use in order to pick the next one
> > to configure.  Keep in mind that all of the configuration for /dev/nbd#
> > devices is done through ioctls to those devices, so having a ioctl interface
> > for the control device is consistent with the rest of how NBD works.
> 
> But adding a random char device node and using ioctls on it is different
> than using an ioctl on an existing block device node that is already
> there.
> 
> So what _exactly_ do you need to do with this interface?  What data do
> you need sent/received to/from the kernel?  Specifics please.

AIUI: Create new devices, and figure out which device is the next one
free so we can find one without having to check the pid attribute for
every device (as is needed today, and which is racy).

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface

2017-01-25 Thread Wouter Verhelst
On Wed, Jan 25, 2017 at 04:48:27PM +, Alex Gartrell wrote:
> On 1/25/17, 6:23 AM, "arndbergm...@gmail.com on behalf of Arnd Bergmann" 
>  wrote:
> > We have multiple established ways to deal with this kind of problem, the 
> > most
> > common ones being a separate chardev as you propose, a sysfs interface and
> > netlink.
> > 
> > They all have their own set of problems, but the needs of nbd as a network
> > storage interface seem most closely resemble what we have for other network
> > related interfaces, where we typically use netlink to do the setup, see e.g.
> > macvtap as an example for a network chardev interface.
> > 
> > Can you have a look if that would solve your problem more nicely?
> 
> FWIW, I'm the one consuming this nbd stuff at Facebook and bringing
> netlink into this would be a huge pain for me.  It's very weird to
> need to pull sockets in and then drop back to ioctls from a design
> perspective, and future consumers of this API would be sad as well.

As who's been maintaining the official userspace nbd client for 15 years
now, I have to concur. NBD has had an API that deals with /dev/nbdX
since forever, adding other APIs to that just feels wrong.

> This is compounded, I think, by the fact that the breadth of
> functionality here doesn't really merit a shared library for everyone
> to use -- could you imagine libnbdcontrol, which exposes a single
> "get_next_device" function?

Please no :)

> If nbd were *all* netlink I think that that'd be fine, but you'd have
> problems implementing the NBD_DOIT function in that fashion.  So I'd
> rather stick to the char device ioctl thing because it's more
> consistent with the old NBD stuff as well as the loop device stuff.

Indeed.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH] nbd: set the logical and physical blocksize properly

2017-02-11 Thread Wouter Verhelst
On Fri, Feb 10, 2017 at 04:47:42PM -0500, Josef Bacik wrote:
> On Fri, 2017-02-10 at 21:07 +0100, Alex Bligh wrote:
> > > 
> > > On 10 Feb 2017, at 19:06, Josef Bacik  wrote:
> > > 
> > > We noticed when trying to do O_DIRECT to an export on the server
> > > side
> > > that we were getting requests smaller than the 4k sectorsize of the
> > > device.  This is because the client isn't setting the logical and
> > > physical blocksizes properly for the underlying device.  Fix this
> > > up by
> > > setting the queue blocksizes and then calling bd_set_size.
> > Interesting. Some input into the info extension (re blocksizes) would
> > definitely be appreciated.
> > 
> 
> What do you mean?  Right now the client is just calling NBD_SET_BLKSIZE
> with 4k blocksize since all of our devices are 4k drives.  Thanks,

He's talking about
,
which is a protocol extension that hasn't been implemented yet but would
be relevant to this patch.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Wouter Verhelst
On Thu, Sep 15, 2016 at 05:20:08AM -0700, Christoph Hellwig wrote:
> On Thu, Sep 15, 2016 at 02:01:59PM +0200, Wouter Verhelst wrote:
> > Yes. There was some discussion on that part, and we decided that setting
> > the flag doesn't hurt, but the spec also clarifies that using it on READ
> > does nothing, semantically.
> >
> > 
> > The problem is that there are clients in the wild which do set it on
> > READ, so it's just a matter of "be liberal in what you accept".
> 
> Note that FUA on READ in SCSI and NVMe does have a meaning - it
> requires you to bypass any sort of cache on the target.  I think it's an
> wrong defintion because it mandates implementation details that aren't
> observable by the initiator, but it's still the spec wording and nbd
> diverges from it.  That's not nessecarily a bad thing, but a caveat to
> look out for.

Yes. I think the kernel nbd driver should probably filter out FUA on
READ. It has no meaning in the case of nbd, and whatever expectations
the kernel may have cannot be provided for by nbd anyway.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Wouter Verhelst
On Thu, Sep 15, 2016 at 01:44:29PM +0100, Alex Bligh wrote:
> 
> > On 15 Sep 2016, at 13:41, Christoph Hellwig  wrote:
> > 
> > On Thu, Sep 15, 2016 at 01:39:11PM +0100, Alex Bligh wrote:
> >> That's probably right in the case of file-based back ends that
> >> are running on a Linux OS. But gonbdserver for instance supports
> >> (e.g.) Ceph based backends, where each connection might be talking
> >> to a completely separate ceph node, and there may be no cache
> >> consistency between connections.
> > 
> > Yes, if you don't have a cache coherent backend you are generally
> > screwed with a multiqueue protocol.
> 
> I wonder if the ability to support multiqueue should be visible
> in the negotiation stage. That would allow the client to refuse
> to select multiqueue where it isn't safe.

The server can always refuse to allow multiple connections.

I was thinking of changing the spec as follows:

diff --git a/doc/proto.md b/doc/proto.md
index 217f57e..cb099e2 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -308,6 +308,23 @@ specification, the
 [kernel 
documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
 may be useful.
 
+For performance reasons, clients MAY open multiple connections to the
+same server. To support such clients, servers SHOULD ensure that at
+least one of the following conditions hold:
+
+* Flush commands are processed for ALL connections. That is, when an
+  `NBD_CMD_WRITE` is processed on one connection, and then an
+  `NBD_CMD_FLUSH` is processed on another connection, the data of the
+  `NBD_CMD_WRITE` on the first connection MUST reach permanent storage
+  before the reply of the `NBD_CMD_FLUSH` is sent.
+* The server allows `NBD_CMD_WRITE` and `NBD_CMD_FLUSH` on at most one
+  connection
+* Multiple connections are not allowed
+
+In addition, clients using multiple connections SHOULD NOT send
+`NBD_CMD_FLUSH` if an `NBD_CMD_WRITE` for which they care in relation to
+the flush has not been replied to yet.
+
  Request message
 
 The request message, sent by the client, looks as follows:

The latter bit (on the client side) is because even if your backend has
no cache coherency issues, TCP does not guarantee ordering between
multiple connections. I don't know if the above is in line with what
blk-mq does, but consider the following scenario:

- A client sends two writes to the server, followed (immediately) by a
  flush, where at least the second write and the flush are not sent over
  the same connection.
- The first write is a small one, and it is handled almost immediately.
- The second write takes a little longer, so the flush is handled
  earlier than the second write
- The network packet containing the flush reply gets lost for whatever
  reason, so the client doesn't get it, and we fall into TCP
  retransmits.
- The second write finishes, and its reply header does not get lost
- After the second write reply reaches the client, the TCP retransmits
  for the flush reply are handled.

In the above scenario, the flush reply arrives on the client side after
a write reply which it did not cover; so the client will (incorrectly)
assume that the write has reached permanent storage when in fact it may
not have done so yet.

If the kernel does not care about the ordering of the two writes versus
the flush, then there is no problem. I don't know how blk-mq works in
that context, but if the above is a likely scenario, we may have to
reconsider adding blk-mq to nbd.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Wouter Verhelst
On Thu, Sep 15, 2016 at 05:08:21PM +0100, Alex Bligh wrote:
> Wouter,
> 
> > The server can always refuse to allow multiple connections.
> 
> Sure, but it would be neater to warn the client of that at negotiation
> stage (it would only be one flag, e.g.  'multiple connections
> unsafe').

I suppose that's not a bad idea.

[...]
> > I was thinking of changing the spec as follows:
> > 
> > diff --git a/doc/proto.md b/doc/proto.md
> > index 217f57e..cb099e2 100644
> > --- a/doc/proto.md
> > +++ b/doc/proto.md
> > @@ -308,6 +308,23 @@ specification, the
> > [kernel 
> > documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
> > may be useful.
> > 
> > +For performance reasons, clients MAY open multiple connections to the
> > +same server. To support such clients, servers SHOULD ensure that at
> > +least one of the following conditions hold:
> > +
> > +* Flush commands are processed for ALL connections. That is, when an
> > +  `NBD_CMD_WRITE` is processed on one connection, and then an
> > +  `NBD_CMD_FLUSH` is processed on another connection, the data of the
> > +  `NBD_CMD_WRITE` on the first connection MUST reach permanent storage
> > +  before the reply of the `NBD_CMD_FLUSH` is sent.
> > +* The server allows `NBD_CMD_WRITE` and `NBD_CMD_FLUSH` on at most one
> > +  connection
> > +* Multiple connections are not allowed
> > +
> > +In addition, clients using multiple connections SHOULD NOT send
> > +`NBD_CMD_FLUSH` if an `NBD_CMD_WRITE` for which they care in relation to
> > +the flush has not been replied to yet.
> > +
> 
> I don't think that should be a mandatory behaviour.

Which part of it?

> For once, it would
> be reasonably easy on gonbdserver but a PITA on the reference server.
> You'd need to put in IPC between each of the forked processes OR rely
> on fdatasync() only - I'm not sure that would necessarily work
> safely with (e.g.) the 'treefiles' / COW options.
> 
> I think better would be to say that the server MUST either
> 
> * Not support NBD_CMD_FLUSH at all

I think we should discourage not supporting FLUSH, rather than
suggesting it. 

> * Support NBD_CMD_FLUSH across channels (as you set out above), or
> * Indicate that it does not support multiple channels.

You dropped the one with no writes. I said "at most" there for a reason.
Originally I was going to say "if the server is read-only", but then
thought that it could work to do the "at most" thing. After having given
that some more thought, I now realize that if you write, you need to
flush across to other channels, regardless of whether they write too, so
that bit of it is moot now anyway.

Still, a server which exports read-only should still be safe for
multiple connections, even if there is no cache coherency (since
presumably nothing's going to change anyway).

[...]
> > The latter bit (on the client side) is because even if your backend has
> > no cache coherency issues, TCP does not guarantee ordering between
> > multiple connections. I don't know if the above is in line with what
> > blk-mq does, but consider the following scenario:
> > 
> > - A client sends two writes to the server, followed (immediately) by a
> >  flush, where at least the second write and the flush are not sent over
> >  the same connection.
> > - The first write is a small one, and it is handled almost immediately.
> > - The second write takes a little longer, so the flush is handled
> >  earlier than the second write
> > - The network packet containing the flush reply gets lost for whatever
> >  reason, so the client doesn't get it, and we fall into TCP
> >  retransmits.
> > - The second write finishes, and its reply header does not get lost
> > - After the second write reply reaches the client, the TCP retransmits
> >  for the flush reply are handled.
> > 
> > In the above scenario, the flush reply arrives on the client side after
> > a write reply which it did not cover; so the client will (incorrectly)
> > assume that the write has reached permanent storage when in fact it may
> > not have done so yet.
> > 
> > If the kernel does not care about the ordering of the two writes versus
> > the flush, then there is no problem. I don't know how blk-mq works in
> > that context, but if the above is a likely scenario, we may have to
> > reconsider adding blk-mq to nbd.
> 
> Actually I think this is a problem anyway. A simpler failure case is
> one where (by chance) one channel gets the writes, and one channel
> gets the flushes. The flush reply is delayed beyond the replies to
> unconnected writes (on the other channel) and hence the kernel thinks
> replied-to writes have been persisted when they have not been.

Yes, that is another example of essentially the same problem.

> The only way to fix that (as far as I can see) without changing flush
> semantics is for the block layer to issue flush requests on each
> channel when flushing on one channel.

Christoph just said that that doesn't (currently) happen; I don't know

Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-09-29 Thread Wouter Verhelst
On Thu, Sep 29, 2016 at 10:03:50AM -0400, Josef Bacik wrote:
> So think of it like normal disks with multiple channels.  We don't send 
> flushes 
> down all the hwq's to make sure they are clear, we leave that decision up to 
> the 
> application (usually a FS of course).

Well, when I asked earlier, Christoph said[1] that blk-mq assumes that
when a FLUSH is sent over one channel, and the reply comes in, that all
commands which have been received, regardless of which channel they were
received over, have reched disk.

[1] Message-ID: <20160915122304.ga15...@infradead.org>

It is impossible for nbd to make such a guarantee, due to head-of-line
blocking on TCP.

[...]
> perhaps we could add a flag later that says send all the flushes down
> all the connections for the paranoid, it should be relatively
> straightforward to do.  Thanks,

That's not an unreasonable approach, I guess.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-06 Thread Wouter Verhelst
On Thu, Oct 06, 2016 at 10:41:36AM +0100, Alex Bligh wrote:
> Wouter,
[...]
> > Given that, given the issue in the previous
> > paragraph, and given the uncertainty introduced with multiple
> > connections, I think it is reasonable to say that a client should just
> > not assume a flush touches anything except for the writes for which it
> > has already received a reply by the time the flush request is sent out.
> 
> OK. So you are proposing weakening the semantic for flush (saying that
> it is only guaranteed to cover those writes for which the client has
> actually received a reply prior to sending the flush, as opposed to
> prior to receiving the flush reply). This is based on the view that
> the Linux kernel client wouldn't be affected, and if other clients
> were affected, their behaviour would be 'somewhat unusual'.

Right.

> We do have one significant other client out there that uses flush
> which is Qemu. I think we should get a view on whether they would be
> affected.

That's certainly something to consider, yes.

> > Those are semantics that are actually useful and can be guaranteed in
> > the face of multiple connections. Other semantics can not.
> 
> Well there is another semantic which would work just fine, and also
> cures the other problem (synchronisation between channels) which would
> be simply that flush is only guaranteed to affect writes issued on the
> same channel. Then flush would do the natural thing, i.e. flush
> all the writes that had been done *on that channel*.

That is an option, yes, but the natural result will be that you issue N
flush requests, rather than one, which I'm guessing will kill
performance. Therefore, I'd prefer not to go down that route.

[...]
> > It is indeed impossible for a server to know what has been received by
> > the client by the time it (the client) sent out the flush request.
> > However, the server doesn't need that information, at all. The flush
> > request's semantics do not say that any request not covered by the flush
> > request itself MUST NOT have hit disk; instead, it just says that there
> > is no guarantee on whether or not that is the case. That's fine; all a
> > server needs to know is that when it receives a flush, it needs to
> > fsync() or some such, and then send the reply. All a *client* needs to
> > know is which requests have most definitely hit the disk. In my
> > proposal, those are the requests that finished before the flush request
> > was sent, and not the requests that finished between that and when the
> > flush reply is received. Those are *likely* to also be covered
> > (especially on single-connection NBD setups), but in my proposal,
> > they're no longer *guaranteed* to be.
> 
> I think my objection was more that you were writing mandatory language
> for a server's behaviour based on what the client perceives.
> 
> What you are saying from the client's point of view is that it under
> your proposed change it can only rely on that writes in respect of
> which the reply has been received prior to issuing the flush are persisted
> to disk (more might be persisted, but the client can't rely on it).

Exactly.

[...]
> IE I don't actually think the wording from the server side needs changing
> now I see what you are trying to do. Just we need a new paragraph saying
> what the client can and cannot reply on.

That's obviously also a valid option. I'm looking forward to your
proposed wording then :-)

[...]
> >> I suppose that's fine in that we can at least shorten the CC: line,
> >> but I still think it would be helpful if the protocol
> > 
> > unfinished sentence here...
> 
>  but I still think it would be helpful if the protocol helped out
> the end user of the client and refused to negotiate multichannel
> connections when they are unsafe. How is the end client meant to know
> whether the back end is not on Linux, not on a block device, done
> via a Ceph driver etc?

Well, it isn't. The server, if it provides certain functionality, should
also provide particular guarantees. If it can't provide those
guarantees, it should not provide that functionality.

e.g., if a server runs on a backend with cache coherency issues, it
should not allow multiple connections to the same device, etc.

> I still think it's pretty damn awkward that with a ceph back end
> (for instance) which would be one of the backends to benefit the
> most from multichannel connections (as it's inherently parallel),
> no one has explained how flush could be done safely.

If ceph doesn't have any way to guarantee that a write is available to
all readers of a particular device, then it *cannot* be used to map
block device semantics with multiple channels. Therefore, it should not
allow writing to the device from multiple clients, period, unless the
filesystem (or other thing) making use of the nbd device above the ceph
layer actually understands how things may go wrong and can take care of
it.

As such, I don't think that the problems inherent in using 

Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-06 Thread Wouter Verhelst
On Thu, Oct 06, 2016 at 06:16:30AM -0700, Christoph Hellwig wrote:
> On Thu, Oct 06, 2016 at 03:09:49PM +0200, Wouter Verhelst wrote:
> > Okay, I've updated the proto.md file then, to clarify that in the case
> > of multiple connections, a client MUST NOT send a flush request until it
> > has seen the replies to the write requests that it cares about. That
> > should be enough for now.
> 
> How do you guarantee that nothing has been reordered or even lost even for
> a single connection?

In the case of a single connection, we already stated that the flush
covers the write requests for which a reply has already been sent out by
the time the flush reply is sent out. On a single connection, there is
no way an implementation can comply with the old requirement but not the
new one.

We do not guarantee any ordering beyond that; and lost requests would be
a bug in the server.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Wouter Verhelst
On Thu, Sep 15, 2016 at 12:09:28PM +0100, Alex Bligh wrote:
> Wouter, Josef, (& Eric)
> 
> > On 15 Sep 2016, at 11:49, Wouter Verhelst <w...@uter.be> wrote:
> > 
> > Hi,
> > 
> > On Fri, Sep 09, 2016 at 10:02:03PM +0200, Wouter Verhelst wrote:
> >> I see some practical problems with this:
> > [...]
> > 
> > One more that I didn't think about earlier:
> > 
> > A while back, we spent quite some time defining the semantics of the
> > various commands in the face of the NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA
> > write barriers. At the time, we decided that it would be unreasonable
> > to expect servers to make these write barriers effective across
> > different connections.
> 
> Actually I wonder if there is a wider problem in that implementations
> might mediate access to a device by presence of an extant TCP connection,
> i.e. only permit one TCP connection to access a given block device at
> once. If you think about (for instance) a forking daemon that does
> writeback caching, that would be an entirely reasonable thing to do
> for data consistency.

Sure. They will have to live with the fact that clients connected to
them will run slower; I don't think that's a problem. In addition,
Josef's client implementation requires the user to explicitly ask for
multiple connections.

There are multiple contexts in which NBD can be used, and in some
performance is more important than in others. I think that is fine.

[...]
> A more general point is that with multiple queues requests
> may be processed in a different order even by those servers that
> currently process the requests in strict order, or in something
> similar to strict order. The server is permitted by the spec
> (save as mandated by NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA) to
> process commands out of order anyway, but I suspect this has
> to date been little tested.

Yes, and that is why I was asking about this. If the write barriers
are expected to be shared across connections, we have a problem. If,
however, they are not, then it doesn't matter that the commands may be
processed out of order.

[...]

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Wouter Verhelst
On Thu, Sep 15, 2016 at 04:52:17AM -0700, Christoph Hellwig wrote:
> On Thu, Sep 15, 2016 at 12:46:07PM +0100, Alex Bligh wrote:
> > Essentially NBD does supports FLUSH/FUA like this:
> > 
> > https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt
> > 
> > IE supports the same FLUSH/FUA primitives as other block drivers (AIUI).
> > 
> > Link to protocol (per last email) here:
> > 
> > https://github.com/yoe/nbd/blob/master/doc/proto.md#ordering-of-messages-and-writes
> 
> Flush as defined by the Linux block layer (and supported that way in
> SCSI, ATA, NVMe) only requires to flush all already completed writes
> to non-volatile media.

That is precisely what FLUSH in nbd does, too.

> It does not impose any ordering unlike the nbd spec.

If you read the spec differently, then that's a bug in the spec. Can you
clarify which part of it caused that confusion? We should fix it, then.

> FUA as defined by the Linux block layer (and supported that way in SCSI,
> ATA, NVMe) only requires the write operation the FUA bit is set on to be
> on non-volatile media before completing the write operation.  It does
> not impose any ordering, which seems to match the nbd spec.  Unlike the
> NBD spec Linux does not allow FUA to be set on anything by WRITE
> commands.  Some other storage protocols allow a FUA bit on READ
> commands or other commands that write data to the device, though.

Yes. There was some discussion on that part, and we decided that setting
the flag doesn't hurt, but the spec also clarifies that using it on READ
does nothing, semantically.

The problem is that there are clients in the wild which do set it on
READ, so it's just a matter of "be liberal in what you accept".

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Wouter Verhelst
On Thu, Sep 15, 2016 at 04:38:07AM -0700, Christoph Hellwig wrote:
> On Thu, Sep 15, 2016 at 12:49:35PM +0200, Wouter Verhelst wrote:
> > A while back, we spent quite some time defining the semantics of the
> > various commands in the face of the NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA
> > write barriers. At the time, we decided that it would be unreasonable
> > to expect servers to make these write barriers effective across
> > different connections.
> 
> Do you have a nbd protocol specification?

https://github.com/yoe/nbd/blob/master/doc/proto.md

> treating a flush or fua as any sort of barrier is incredibly stupid.

Maybe I'm not using the correct terminology here. The point is that
after a FLUSH, the server asserts that all write commands *for which a
reply has already been sent to the client* will also have reached
permanent storage. Nothing is asserted about commands for which the
reply has not yet been sent, regardless of whether they were sent to the
server before or after the FLUSH; they may reach permanent storage as a
result of the FLUSH, or they may not, we don't say.

With FUA, we only assert that the FUA-flagged command reaches permanent
storage before its reply is sent, nothing else.

If that's not a write barrier, then I was using the wrong terminology
(and offer my apologies for the confusion).

The point still remains that "X was sent before Y" is difficult to
determine on the client side if X was sent over a different TCP channel
than Y, because a packet might be dropped (requiring a retransmit) for
X, and similar things. If blk-mq can deal with that, we're good and
nothing further needs to be done. If not, this should be evaluated by
someone more familiar with the internals of the kernel block subsystem
than me.

> Is it really documented that way, and if yes, why?

The documentation does use the term write barrier, but only right before
the same explanation as above. It does so because I assumed that that is
what a write barrier is, and that this would make things easier to
understand. If you tell me that that term is wrong as used there, I can
easily remove it (it's not critical to the rest of the documentation).

Regards,

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Wouter Verhelst
Hi,

On Fri, Sep 09, 2016 at 10:02:03PM +0200, Wouter Verhelst wrote:
> I see some practical problems with this:
[...]

One more that I didn't think about earlier:

A while back, we spent quite some time defining the semantics of the
various commands in the face of the NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA
write barriers. At the time, we decided that it would be unreasonable
to expect servers to make these write barriers effective across
different connections.

Since my knowledge of kernel internals is limited, I tried finding some
documentation on this, but I guess that either it doesn't exist or I'm
looking in the wrong place; therefore, am I correct in assuming that
blk-mq knows about such semantics, and will handle them correctly (by
either sending a write barrier to all queues, or not making assumptions
about write barriers that were sent over a different queue)? If not,
this may be something that needs to be taken care of.

Thanks,

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-09-29 Thread Wouter Verhelst
Hi Josef,

On Wed, Sep 28, 2016 at 04:01:32PM -0400, Josef Bacik wrote:
> NBD can become contended on its single connection.  We have to serialize all
> writes and we can only process one read response at a time.  Fix this by
> allowing userspace to provide multiple connections to a single nbd device.  
> This
> coupled with block-mq drastically increases performance in multi-process 
> cases.
> Thanks,

This reminds me: I've been pondering this for a while, and I think there
is no way we can guarantee the correct ordering of FLUSH replies in the
face of multiple connections, since a WRITE reply on one connection may
arrive before a FLUSH reply on another which it does not cover, even if
the server has no cache coherency issues otherwise.

Having said that, there can certainly be cases where that is not a
problem, and where performance considerations are more important than
reliability guarantees; so once this patch lands in the kernel (and the
necessary support patch lands in the userland utilities), I think I'll
just update the documentation to mention the problems that might ensue,
and be done with it.

I can see only a few ways in which to potentially solve this problem:
- Kernel-side nbd-client could send a FLUSH command over every channel,
  and only report successful completion once all replies have been
  received. This might negate some of the performance benefits, however.
- Multiplexing commands over a single connection (perhaps an SCTP one,
  rather than TCP); this would require some effort though, as you said,
  and would probably complicate the protocol significantly.

Regards,

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-03 Thread Wouter Verhelst
Alex,
Christoph,

On Mon, Oct 03, 2016 at 12:34:33PM +0100, Alex Bligh wrote:
> On 3 Oct 2016, at 08:57, Christoph Hellwig  wrote:
> >> Can you clarify what you mean by that? Why is it an "odd flush
> >> definition", and how would you "properly" define it?
> > 
> > E.g. take the defintion from NVMe which also supports multiple queues:
> > 
> > "The Flush command shall commit data and metadata associated with the
> > specified namespace(s) to non-volatile media. The flush applies to all
> > commands completed prior to the submission of the Flush command.

Oh, prior to submission? Okay, that means I must have misunderstood what
you meant before; in that case there shouldn't be a problem.

> > The controller may also flush additional data and/or metadata from any
> > namespace."
> > 
> > The focus is completed - we need to get a reply to the host first
> > before we can send the flush command, so anything that we require
> > to be flushed needs to explicitly be completed first.
> 
> I think there are two separate issues here:
> 
> a) What's described as the "HOL blocking issue". 
> 
> This comes down to what Wouter said here:
> 
> > Well, when I asked earlier, Christoph said[1] that blk-mq assumes that
> > when a FLUSH is sent over one channel, and the reply comes in, that all
> > commands which have been received, regardless of which channel they were
> > received over, have reched disk.
> > 
> > [1] Message-ID: <20160915122304.ga15...@infradead.org>
> > 
> > It is impossible for nbd to make such a guarantee, due to head-of-line
> > blocking on TCP.
> 
> this is perfectly accurate as far as it goes, but this isn't the current
> NBD definition of 'flush'.

I didn't read it that way.

> That is (from the docs):
> 
> > All write commands (that includes NBD_CMD_WRITE, and NBD_CMD_TRIM)
> > that the server completes (i.e. replies to) prior to processing to a
> > NBD_CMD_FLUSH MUST be written to non-volatile storage prior to
> > replying to thatNBD_CMD_FLUSH.

This is somewhat ambiguous, in that (IMO) it doesn't clearly state the
point where the cutoff of "may not be on disk yet" is. What is
"processing"? We don't define that, and therefore it could be any point
between "receipt of the request message" and "sending the reply
message". I had interpreted it closer to the latter than was apparently
intended, but that isn't very useful; I see now that it should be closer
to the former; a more useful definition is probably something along the
following lines:

All write commands (that includes NBD_CMD_WRITE and NBD_CMD_TRIM)
for which a reply was received on the client side prior to the
transmission of the NBD_CMD_FLUSH message MUST be written to
non-volatile storage prior to replying to that NBD_CMD_FLUSH. A
server MAY process this command in ways that result committing more
data to non-volatile storage than is strictly required.

[...]
> I don't think there is actually a problem here - Wouter if I'm wrong
> about this, I'd like to understand your argument better.

No, I now see that there isn't, and I misunderstood things. However, I
do think we should update the spec to clarify this.

> b) What I'm describing - which is the lack of synchronisation between
> channels.
[... long explanation snipped...]

Yes, and I acknowledge that. However, I think that should not be a
blocker. It's fine to mark this feature as experimental; it will not
ever be required to use multiple connections to connect to a server.

When this feature lands in nbd-client, I plan to ensure that the man
page and -help output says something along the following lines:

 use N connections to connect to the NBD server, improving performance
 at the cost of a possible loss of reliability.

 The interactions between multiple connections and the NBD_CMD_FLUSH
 command, especially when the actual storage and the NBD server are not
 physically on the same machine, are not currently well defined and not
 completely understood, and therefore the use of multiple connections to
 the same server could theoretically lead to data corruption and/or
 loss. Use with caution.

This probably needs some better wording, but you get the idea.

(also, this will interact really really badly with the reference
implementation's idea of copy-on-write, I suppose, since that implements
COW on a per-socket basis with a per-IP diff file...)

Anyway, the point is that this is a feature which may still cause
problems in a number of edge cases and which should therefore not be the
default yet, but which can be useful in a number of common situations
for which NBD is used today.

[...]
> Now, in the reference server, NBD_CMD_FLUSH is implemented through an
> fdatasync().

Actually, no, the reference server uses fsync() for reasons that I've
forgotten (side note: you wrote it that way ;-)

[...]

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all 

Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-03 Thread Wouter Verhelst
On Sun, Oct 02, 2016 at 05:17:14PM +0100, Alex Bligh wrote:
> On 29 Sep 2016, at 17:59, Josef Bacik  wrote:
> > Huh I missed that.  Yeah that's not possible for us for sure, I think my 
> > option 
> > idea is the less awful way forward if we want to address that limitation.  
> > Thanks,
> 
> I think if the server supports flush (which you can tell), sending flush on
> all channels is the only safe thing to do, without substantial protocol
> changes (which I'm not sure how one would do given flush is in a sense
> a synchronisation point). I think it's thus imperative this gets fixed
> before the change gets merged.

Whoa there, Alex.

I don't think this should be a blocker. There is a theoretical problem
yes, but I believe it to be limited to the case where the client and the
server are not in the same broadcast domain, which is not the common
case (most NBD connections run either over the localhost iface, or to a
machine nearby). In the case where the client and server are on the same
LAN, random packet drop is highly unlikely, so TCP communication will
not be delayed and so the replies will, with high certainty, arrive in
the same order that they were sent.

Obviously the documentation for the "spawn multiple connections" option
in nbd-client needs to clearly state that it will decrease reliability
in this edge case, but I don't think that blocking this feature until a
solution for this problem is implemented is the right way forward. There
are valid use cases where using multiple connections is preferable, even
with the current state of affairs, and they do not all involve "switch
off flush".

Regards,

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-03 Thread Wouter Verhelst
On Mon, Oct 03, 2016 at 12:20:49AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 03, 2016 at 01:47:06AM +, Josef Bacik wrote:
> > It's not "broken", it's working as designed, and any fs on top of this
> > patch will be perfectly safe because they all wait for their io to complete
> > before issuing the FLUSH.  If somebody wants to address the paranoid case
> > later then all the power to them, but this works for my use case and isn't
> > inherently broken.  If it doesn't work for yours then don't use the
> > feature, it's that simple.  Thanks,
> 
> Let's take one step back here.  I agree with Josef that sending
> one single flush is perfectly fine for all usual cases.  The issue
> that was brought up last time we had this discussion was that some
> (I think mostly theoretical) backends could not be coherent and
> this would be an issue.

Actually, I was pointing out the TCP head-of-line issue, where a delay
on the socket that contains the flush reply would result in the arrival
in the kernel block layer of a write reply before the said flush reply,
resulting in a write being considered part of the flush when in fact it
was not.

This is an edge case, and one highly unlikely to result in problems in
the common case (as per my other mail), but it is something to consider.

> So maybe the right way is to simply not support the current odd flush
> defintion in the kernel then and require a new properly defined flush
> version instead.

Can you clarify what you mean by that? Why is it an "odd flush
definition", and how would you "properly" define it?

Thanks,

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][V4] nbd: add multi-connection support

2016-11-22 Thread Wouter Verhelst
Hi Josef,

[cc to nbd-general added]

On Thu, Nov 17, 2016 at 03:27:30PM -0500, Josef Bacik wrote:
> NBD can become contended on its single connection.  We have to serialize all
> writes and we can only process one read response at a time.  Fix this by
> allowing userspace to provide multiple connections to a single nbd device.  
> This
> coupled with block-mq drastically increases performance in multi-process 
> cases.
> Thanks,
> 
> Signed-off-by: Josef Bacik 
> ---
> V3->V4:
> -Fixed a problem where fast completes (or early completes) would crash because
>  we were still accessing the bio's on the submit side.
> -Added a flag to disallow multi-connection support if the server doesn't
>  explicitly allow for them.
[...]
> + if (num_connections > 1 &&
> + !(nbd->flags & NBD_FLAG_CAN_MULTI_CONN)) {
> + dev_err(disk_to_dev(nbd->disk), "server does not 
> support multiple connections per device.\n");
> + goto out_err;
> + }

I'm not sure whether the kernel needs to check this. I agree that the
flag can be useful, but it's probably something for userspace to check,
rather than for the kernel; I could imagine a --force parameter to be
useful in some corner cases.

Having said that, implementing such a parameter could also be done by
artificially adding particular flags, so I'm certainly not opposed to
this.

[...]
> +#define NBD_FLAG_CAN_MULTI_CONN  (1 << 6)/* Server supports 
> multiple connections per export. */

NAK, that is already specified in the protocol spec at
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md to be
NBD_FLAG_SEND_WRITE_ZEROES. Bit 7 is also taken already, so please use
bit 8 instead.

(I'll reserve that bit in that document to be "export is multi-conn safe" in a
minute)

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nbd] [PATCH 00/12] nbd: Netlink interface and path failure enhancements

2017-04-07 Thread Wouter Verhelst
Hi Josef,

On Thu, Apr 06, 2017 at 05:05:25PM -0400, Josef Bacik wrote:
> 
> > On Apr 6, 2017, at 5:01 PM, Josef Bacik  wrote:
> > 
> > This patchset adds a new netlink configuration interface to NBD as well as a
> > bunch of enhancments around path failures.  The patches provide the 
> > following
> > enhancemnts to NBD
> > 
> > - Netlink configuration interface that doesn't leave a userspace application
> >   waiting in kernel space for the device to disconnect.
> > - Netlink reconfigure interface for adding re-connected sockets to replace 
> > dead
> >   sockets.
> > - A flag to destroy the NBD device on disconnect, much like how mount -o 
> > loop
> >   works.
> > - A status interface that currently will only report whether a device is
> >   connected or not, but can be extended to include whatever in the future.
> > - A netlink multicast notification scheme to notify user space when there 
> > are
> >   connection issues to allow for seamless reconnects.
> > - Dead link handling.  You can specify a dead link timeout and the NBD 
> > device
> >   will pause IO for that timeout waiting to see if the connection can be
> >   re-established.  This is helpful to allow for things like nbd server 
> > upgrades
> >   where the whole server disappears for a short period of time.
> > 
> > These patches have been thorougly and continuously tested for about a month.
> > I've been finding bugs in various places, but this batch has been solid for 
> > the
> > last few days of testing, which include a constant disconnect/reconnect 
> > torture
> > test.  Thanks,
> 
> Oops forgot to mention that I have patches for the nbd user space side
> that utilizes all of these new interfaces, you can find the tree here
> 
> https://github.com/josefbacik/nbd

Yeah, found that already :-)

> The user space patches are a bit rough, but utilize everything so good
> for testing. They will be cleaned up and submitted once the kernel
> part is upstream. Thanks,

No worries, I'll probably clean it up myself. TBH, cleaning up
nbd-client so it becomes somewhat less of a beast has been on my TODO
list for a while, and this will only add to it... maybe that'll finally
convince me that it's time ;-)

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12


Re: [Nbd] [PATCH 6/6] nbd: add a basic netlink interface

2017-03-08 Thread Wouter Verhelst
On Wed, Mar 08, 2017 at 09:56:48AM -0500, Josef Bacik wrote:
> On Wed, 2017-03-08 at 11:07 +0100, Wouter Verhelst wrote:
> > On Tue, Feb 28, 2017 at 11:57:11AM -0500, Josef Bacik wrote:
> > > 
> > > The existing ioctl interface for configuring NBD devices is a bit
> > > cumbersome and hard to extend.  The other problem is we leave a
> > > userspace app sitting in it's syscall until the device disconnects,
> > > which is less than ideal.
> > True.
> > 
> > On the other hand, it has the advantage that you leave a userspace
> > app
> > sitting around until the device disconnects, which allows for some
> > form
> > of recovery in case you're doing root-on-NBD and the server has a
> > hiccup. Don't underestimate the advantage of that.
> > 
> > (of course, that requires that the return value of NBD_DO_IT makes a
> > difference between "unexpected connection drop" and "we sent
> > NBD_CMD_DISC", but that's a different matter entirely)
> 
> Stay tuned for further developments ;).

Heh.

> Yeah the problem is that even
> though we can return and allow the user to reconnect, we completely
> tear down the device and will return EIO to anything that comes in
> while we're reconnecting, which sucks for users.

Quite, yes.

> The patches that I'm testing now will multi-cast messages over netlink
> when a link goes down so a user space application can reconnect and
> provide a new connection seamlessly.  The next step after that is to
> allow a complete failure of all connections and we will simply sit
> there and queue IO until userspace reconnects or the configured
> timeout elapses at which point we'll tear down the device.  The end
> goal of all of this is seamless reconnects without throwing errors.

Awesome. That would mean userspace would need to sit around, but I
suppose that isn't something we can't live with (and actually has other
advantages, too).

Thanks,

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12


Re: [Nbd] [PATCH 6/6] nbd: add a basic netlink interface

2017-03-08 Thread Wouter Verhelst
On Tue, Feb 28, 2017 at 11:57:11AM -0500, Josef Bacik wrote:
> The existing ioctl interface for configuring NBD devices is a bit
> cumbersome and hard to extend.  The other problem is we leave a
> userspace app sitting in it's syscall until the device disconnects,
> which is less than ideal.

True.

On the other hand, it has the advantage that you leave a userspace app
sitting around until the device disconnects, which allows for some form
of recovery in case you're doing root-on-NBD and the server has a
hiccup. Don't underestimate the advantage of that.

(of course, that requires that the return value of NBD_DO_IT makes a
difference between "unexpected connection drop" and "we sent
NBD_CMD_DISC", but that's a different matter entirely)

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12


Re: [Nbd] [PATCH 3/4] treewide: convert PF_MEMALLOC manipulations to new helpers

2017-04-06 Thread Wouter Verhelst
On Wed, Apr 05, 2017 at 01:30:31PM +0200, Michal Hocko wrote:
> On Wed 05-04-17 09:46:59, Vlastimil Babka wrote:
> > We now have memalloc_noreclaim_{save,restore} helpers for robust setting and
> > clearing of PF_MEMALLOC. Let's convert the code which was using the generic
> > tsk_restore_flags(). No functional change.
> 
> It would be really great to revisit why those places outside of the mm
> proper really need this flag. I know this is a painful exercise but I
> wouldn't be surprised if there were abusers there.
[...]
> > ---
> >  drivers/block/nbd.c  | 7 ---
> >  drivers/scsi/iscsi_tcp.c | 7 ---
> >  net/core/dev.c   | 7 ---
> >  net/core/sock.c  | 7 ---
> >  4 files changed, 16 insertions(+), 12 deletions(-)

These were all done to make swapping over network safe. The idea is that
if a socket has SOCK_MEMALLOC set, incoming packets for that socket can
access PFMEMALLOC reserves (whereas other sockets cannot); this all in
the hope that one packe destined to that socket will contain the TCP ACK
that confirms the swapout was successful and we can now release RAM
pages for other processes.

I don't know whether they need the PF_MEMALLOC flag specifically (not a
kernel hacker), but they do need to interact with it at any rate.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12


Re: [Nbd] [PATCH 1/3] nbd: allow multiple disconnects to be sent

2017-07-25 Thread Wouter Verhelst
On Mon, Jul 24, 2017 at 07:24:30PM -0400, Josef Bacik wrote:
> On Mon, Jul 24, 2017 at 10:08:21PM +0200, Wouter Verhelst wrote:
> > On Fri, Jul 21, 2017 at 10:48:13AM -0400, jo...@toxicpanda.com wrote:
> > > From: Josef Bacik <jba...@fb.com>
> > > 
> > > There's no reason to limit ourselves to one disconnect message per
> > > socket.  Sometimes networks do strange things, might as well let
> > > sysadmins hit the panic button as much as they want.
> > 
> > The protocol spec is pretty clear that any requests sent after the
> > disconnect request was sent out are not guaranteed to be processed
> > anymore.
> > 
> > Doesn't this allow more requests to be sent out? Or is the
> > NBD_DISCONNECT_REQUESTED flag enough to make that impossible?
> > 
> 
> This just allows users to call the disconnect ioctl/netlink thing multiple 
> times
> and have it send the DISCONNECT command if they want.

Right.

> We've had problems with our in-hosue nbd server missing messages,

That's pretty bad...

> and it's just a pain to have to unstuck it because the server messed up.
> It's just for the rare case the server is being weird, not because we
> expect/guarantee that subsequent disconnect commands will be processed.
> Thanks,

Okay, makes sense. Just thought I'd ask :-)

-- 
Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
 Hacklab


Re: [Nbd] [PATCH 1/3] nbd: allow multiple disconnects to be sent

2017-07-24 Thread Wouter Verhelst
On Fri, Jul 21, 2017 at 10:48:13AM -0400, jo...@toxicpanda.com wrote:
> From: Josef Bacik 
> 
> There's no reason to limit ourselves to one disconnect message per
> socket.  Sometimes networks do strange things, might as well let
> sysadmins hit the panic button as much as they want.

The protocol spec is pretty clear that any requests sent after the
disconnect request was sent out are not guaranteed to be processed
anymore.

Doesn't this allow more requests to be sent out? Or is the
NBD_DISCONNECT_REQUESTED flag enough to make that impossible?

-- 
Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
 Hacklab


[Nbd] [PATCH] MAINTAINERS: update list for NBD

2017-09-22 Thread Wouter Verhelst
nbd-gene...@sourceforge.net becomes n...@other.debian.org, because
sourceforge is just a spamtrap these days.

Signed-off-by: Wouter Verhelst <w...@uter.be>
Reviewed-by: Josef Bacik <jba...@fb.com>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index fbb269415f06..2f8824995a93 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9326,7 +9326,7 @@ NETWORK BLOCK DEVICE (NBD)
 M: Josef Bacik <jba...@fb.com>
 S: Maintained
 L: linux-block@vger.kernel.org
-L: nbd-gene...@lists.sourceforge.net
+L: n...@other.debian.org
 F: Documentation/blockdev/nbd.txt
 F: drivers/block/nbd.c
 F: include/uapi/linux/nbd.h
-- 
2.14.1


Re: [PATCH] nbd: set discard_alignment to the granularity

2018-06-12 Thread Wouter Verhelst
Hi Josef,

On Tue, Jun 05, 2018 at 11:41:23AM -0400, Josef Bacik wrote:
> Technically we should be able to get away with 0 as the
> discard_alignment, but there's no way currently for the protocol to
> indicate different alignments,

Actually there is, with the NBD_INFO_BLOCK_SIZE (and related)
response(s) to the NBD_OPT_GO message during negotiation, but
implementing that will probably require some more netlink messages.

(some of this is still being hashed out on the NBD mailinglist; I'm sure
your insights in that would be welcome)

-- 
Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
 Hacklab