Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface

2017-01-26 Thread Christoph Hellwig
On Thu, Jan 26, 2017 at 10:17:58AM +0100, Greg KH wrote:
> Ok, but do you feel the "loop method" of using a char device node to
> create/control these devices is a good model to follow for new devices
> like ndb?

Yes. We've done the same for NVMe over fabrics.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface

2017-01-26 Thread Christoph Hellwig
On Wed, Jan 25, 2017 at 03:36:20PM -0600, Eric Blake wrote:
> How do you get an fd to existing nbd block device?  Your intent is to
> use an ioctl to request creating/opening a new nbd device that no one
> else is using; opening an existing device in order to send that ioctl
> may have negative ramifications to the actual user of that existing
> device, if not permissions issues that prevent the open from even
> happening.  Having a separate control fd makes it obvious that you are
> asking for a new nbd device, and don't want to stomp on any existing
> devices.

Yes, - this whole concept of needing a device first to then associate
it with a backing store is something we had a in a lot of drivers,
and it's always been a major problem.  Thus we've been slowly moving
all the drivers off it.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-03 Thread Christoph Hellwig
On Mon, Oct 03, 2016 at 09:51:49AM +0200, Wouter Verhelst wrote:
> Actually, I was pointing out the TCP head-of-line issue, where a delay
> on the socket that contains the flush reply would result in the arrival
> in the kernel block layer of a write reply before the said flush reply,
> resulting in a write being considered part of the flush when in fact it
> was not.

The kernel (or any other user of SCSI/ATA/NVMe-like cache flushes)
will wait for all I/O that needs to be in the cache for explicitly,
so this is not a problem.

> Can you clarify what you mean by that? Why is it an "odd flush
> definition", and how would you "properly" define it?

E.g. take the defintion from NVMe which also supports multiple queues:

"The Flush command shall commit data and metadata associated with the
specified namespace(s) to non-volatile media. The flush applies to all
commands completed prior to the submission of the Flush command.
The controller may also flush additional data and/or metadata from any
namespace."

The focus is completed - we need to get a reply to the host first
before we can send the flush command, so anything that we require
to be flushed needs to explicitly be completed first.


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH][V3] nbd: add multi-connection support

2016-10-03 Thread Christoph Hellwig
On Mon, Oct 03, 2016 at 01:47:06AM +, Josef Bacik wrote:
> It's not "broken", it's working as designed, and any fs on top of this patch 
> will be perfectly safe because they all wait for their io to complete before 
> issuing the FLUSH.  If somebody wants to address the paranoid case later then 
> all the power to them, but this works for my use case and isn't inherently 
> broken.  If it doesn't work for yours then don't use the feature, it's that 
> simple.  Thanks,

Let's take one step back here.  I agree with Josef that sending
one single flush is perfectly fine for all usual cases.  The issue
that was brought up last time we had this discussion was that some
(I think mostly theoretical) backends could not be coherent and
this would be an issue.

So maybe the right way is to simply not support the current odd flush
defintion in the kernel then and require a new properly defined flush
version instead.

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 01:39:11PM +0100, Alex Bligh wrote:
> That's probably right in the case of file-based back ends that
> are running on a Linux OS. But gonbdserver for instance supports
> (e.g.) Ceph based backends, where each connection might be talking
> to a completely separate ceph node, and there may be no cache
> consistency between connections.

Yes, if you don't have a cache coherent backend you are generally
screwed with a multiqueue protocol.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 01:33:20PM +0100, Alex Bligh wrote:
> At an implementation level that is going to be a little difficult
> for some NBD servers, e.g. ones that fork() a different process per
> connection. There is in general no IPC to speak of between server
> instances. Such servers would thus be unsafe with more than one
> connection if FLUSH is in use.
> 
> I believe such servers include the reference server where there is
> process per connection (albeit possibly with several threads).
> 
> Even single process servers (including mine - gonbdserver) would
> require logic to pair up multiple connections to the same
> device.

Why?  If you only send the completion after your I/O syscall returned
your are fine if fsync comes from a difference process, no matter
if you're using direct or buffered I/O underneath.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 02:21:20PM +0200, Wouter Verhelst wrote:
> Right. So do I understand you correctly that blk-mq currently doesn't
> look at multiple queues, and just assumes that if a FLUSH is sent over
> any one of the queues, it applies to all queues?

Yes.  The same is true at the protocol level for NVMe or SCSI transports
that can make use of multiple queues.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 02:01:59PM +0200, Wouter Verhelst wrote:
> Yes. There was some discussion on that part, and we decided that setting
> the flag doesn't hurt, but the spec also clarifies that using it on READ
> does nothing, semantically.
>
> 
> The problem is that there are clients in the wild which do set it on
> READ, so it's just a matter of "be liberal in what you accept".

Note that FUA on READ in SCSI and NVMe does have a meaning - it
requires you to bypass any sort of cache on the target.  I think it's an
wrong defintion because it mandates implementation details that aren't
observable by the initiator, but it's still the spec wording and nbd
diverges from it.  That's not nessecarily a bad thing, but a caveat to
look out for.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 01:11:24PM +0100, Alex Bligh wrote:
> > NBD_CMD_FLUSH (3)
> > 
> > A flush request; a write barrier. 
> 
> I can see that's potentially confusing as isn't meant to mean 'an old-style
> linux kernel block device write barrier'. I think in general terms it
> probably is some form of barrier, but I see no problem in deleting the
> words "a write barrier" from the spec text if only to make it
> clearer.

Yes, please do that.  A "barrier" implies draining of the queue.

> However, I think the description of the command itself:
> 
> > The server MUST NOT send a successful reply header for this request before 
> > all write requests for which a reply has already been sent to the client 
> > have reached permanent storage (using fsync() or similar).
> 
> and the ordering section I pointed you to before, were both correct, yes?

Yes, this seems correct.

> actually fdatasync() technically does more than is necessary, as it
> will also flush commands that have been processed, but for which no
> reply has yet been sent - that's no bad thing.

Yes.  But without an actual barrier it's hard to be exact - and
fdatasync does the right thing by including false positives.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 01:55:14PM +0200, Wouter Verhelst wrote:
> Maybe I'm not using the correct terminology here. The point is that
> after a FLUSH, the server asserts that all write commands *for which a
> reply has already been sent to the client* will also have reached
> permanent storage. Nothing is asserted about commands for which the
> reply has not yet been sent, regardless of whether they were sent to the
> server before or after the FLUSH; they may reach permanent storage as a
> result of the FLUSH, or they may not, we don't say.
> 
> With FUA, we only assert that the FUA-flagged command reaches permanent
> storage before its reply is sent, nothing else.

Yes, that's the correct semantics.

> If that's not a write barrier, then I was using the wrong terminology
> (and offer my apologies for the confusion).

It's not a write barrier - a write barrier was command that ensured that

 a) all previous writes were completed to the host/client
 b) all previous writes were on non-volatile storage

and

 c) the actual write with the barrier bit was on non-volatile storage

> The point still remains that "X was sent before Y" is difficult to
> determine on the client side if X was sent over a different TCP channel
> than Y, because a packet might be dropped (requiring a retransmit) for
> X, and similar things. If blk-mq can deal with that, we're good and
> nothing further needs to be done. If not, this should be evaluated by
> someone more familiar with the internals of the kernel block subsystem
> than me.

The important bit in all the existing protocols, and which Linux relies
on is that any write the Linux block layer got a completion for earlier
needs to be flushed out to non-volatile storage when a FLUSH command is
set.  Anything that still is in flight does not matter.  Which for
NBD means anything that you already replies to need to be flushed.

Or to say it more practicly - in the nbd server you simply need to
call fdatasync on the backing device or file whenever you get a FLUSH
requires, and it will do the right thing.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 12:46:07PM +0100, Alex Bligh wrote:
> Essentially NBD does supports FLUSH/FUA like this:
> 
> https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt
> 
> IE supports the same FLUSH/FUA primitives as other block drivers (AIUI).
> 
> Link to protocol (per last email) here:
> 
> https://github.com/yoe/nbd/blob/master/doc/proto.md#ordering-of-messages-and-writes

Flush as defined by the Linux block layer (and supported that way in
SCSI, ATA, NVMe) only requires to flush all already completed writes
to non-volatile media.  It does not impose any ordering unlike the
nbd spec.

FUA as defined by the Linux block layer (and supported that way in SCSI,
ATA, NVMe) only requires the write operation the FUA bit is set on to be
on non-volatile media before completing the write operation.  It does
not impose any ordering, which seems to match the nbd spec.  Unlike the
NBD spec Linux does not allow FUA to be set on anything by WRITE
commands.  Some other storage protocols allow a FUA bit on READ
commands or other commands that write data to the device, though.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 12:43:35PM +0100, Alex Bligh wrote:
> Sure, it's at:
> 
> https://github.com/yoe/nbd/blob/master/doc/proto.md#ordering-of-messages-and-writes
> 
> and that link takes you to the specific section.
> 
> The treatment of FLUSH and FUA is meant to mirror exactly the
> linux block layer (or rather how the linux block layer was a few
> years ago). I even asked on LKML to verify a few points.

Linux never expected ordering on the wire.  Before 2010 we had barriers
in the kernel that provided ordering to the caller, but we never
required it from the protocol / hardware.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 12:09:28PM +0100, Alex Bligh wrote:
> A more general point is that with multiple queues requests
> may be processed in a different order even by those servers that
> currently process the requests in strict order, or in something
> similar to strict order. The server is permitted by the spec
> (save as mandated by NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA) to
> process commands out of order anyway, but I suspect this has
> to date been little tested.

The Linux kernel does not assume any synchroniztion between block I/O
commands.  So any sort of synchronization a protocol does is complete
overkill for us.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Christoph Hellwig
On Thu, Sep 15, 2016 at 12:49:35PM +0200, Wouter Verhelst wrote:
> A while back, we spent quite some time defining the semantics of the
> various commands in the face of the NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA
> write barriers. At the time, we decided that it would be unreasonable
> to expect servers to make these write barriers effective across
> different connections.

Do you have a nbd protocol specification?  treating a flush or fua
as any sort of barrier is incredibly stupid.  Is it really documented
that way, and if yes, why?

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH 5/5] nbd: add multi-connection support

2016-09-10 Thread Christoph Hellwig
Hi Josef,

I haven't read the full path as I'm a bit in a hurry, but is there
a good reason to not simply have a socket per-hw_ctx and store it in
the hw_ctx private data instead of using the index in the nbd_cmd
structure?

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH 2/5] nbd: Remove signal usage

2016-01-03 Thread Christoph Hellwig
Hi Markus,

this looks great!

Reviewed-by: Christoph Hellwig <h...@lst.de>

One thing I noticed, which might be a good cleanup in the future:

> - spin_lock_irqsave(>tasks_lock, flags);
>   nbd->task_recv = current;
> - spin_unlock_irqrestore(>tasks_lock, flags);

It seems like task_{send,recv} are only used for the files showing
the pids.  Maybe you should just store the pids directly in the
nbd_device structure and also keep the sysfs file around for the
life time of that structure.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH 1/1] kthread: introduce kthread_get_run() to fix __nbd_ioctl()

2015-10-26 Thread Christoph Hellwig
On Sun, Oct 25, 2015 at 03:27:13PM +0100, Oleg Nesterov wrote:
> It is not safe to use the task_struct returned by kthread_run(threadfn)
> if threadfn() can exit before the "owner" does kthread_stop(), nothing
> protects this task_struct.
> 
> So __nbd_ioctl() looks buggy; a killed nbd_thread_send() can exit, free
> its task_struct, and then kthread_stop() can use the freed/reused memory.
> 
> Add the new trivial helper, kthread_get_run(). Hopefully it will have more
> users, this patch changes __nbd_ioctl() as an example.

This looks horrible.  I think the real problem is that nbd is totally
abusing signals for kthreads and that needs to go away.

--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH v3 0/7] block: reread partitions changes and fix for loop

2015-05-18 Thread Christoph Hellwig
This series looks good to me,

Reviewed-by: Christoph Hellwig h...@lst.de

--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH v2 0/7] block: reread partitions changes and fix for loop

2015-04-13 Thread Christoph Hellwig
The series looks fine to me:

Reviewed-by: Christoph Hellwig h...@lst.de

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open

2015-04-06 Thread Christoph Hellwig
On Mon, Apr 06, 2015 at 12:28:22AM +0800, Ming Lei wrote:
 Another simpler way is to make lo_refcnt as atomic_t and remove
 lo_ctrl_mutext in lo_open(), and freeze request queue during clearing
 fd, and better to freeze queue during setting fd too, so will update in
 v1 with this way.

Using an atomic_t sounds good to me.

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general


Re: [Nbd] [PATCH 7/9] nbd: Remove fixme that was already fixed

2015-04-02 Thread Christoph Hellwig
On Thu, Apr 02, 2015 at 10:11:39AM +0200, Markus Pargmann wrote:
 +/*
 + * Forcibly shutdown the socket causing all listeners to error
 + */
  static void sock_shutdown(struct nbd_device *nbd, int lock)
  {
 - /* Forcibly shutdown the socket causing all listeners
 -  * to error
 -  *
 -  * FIXME: This code is duplicated from sys_shutdown, but
 -  * there should be a more generic interface rather than
 -  * calling socket ops directly here */
   if (lock)
   mutex_lock(nbd-tx_lock);
   if (nbd-sock) {

Please also kill the conditional locking here in a future patch by
moving it into the caller.

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general