Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface
On Thu, Jan 26, 2017 at 10:17:58AM +0100, Greg KH wrote: > Ok, but do you feel the "loop method" of using a char device node to > create/control these devices is a good model to follow for new devices > like ndb? Yes. We've done the same for NVMe over fabrics. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface
On Wed, Jan 25, 2017 at 03:36:20PM -0600, Eric Blake wrote: > How do you get an fd to existing nbd block device? Your intent is to > use an ioctl to request creating/opening a new nbd device that no one > else is using; opening an existing device in order to send that ioctl > may have negative ramifications to the actual user of that existing > device, if not permissions issues that prevent the open from even > happening. Having a separate control fd makes it obvious that you are > asking for a new nbd device, and don't want to stomp on any existing > devices. Yes, - this whole concept of needing a device first to then associate it with a backing store is something we had a in a lot of drivers, and it's always been a major problem. Thus we've been slowly moving all the drivers off it. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH][V3] nbd: add multi-connection support
On Mon, Oct 03, 2016 at 09:51:49AM +0200, Wouter Verhelst wrote: > Actually, I was pointing out the TCP head-of-line issue, where a delay > on the socket that contains the flush reply would result in the arrival > in the kernel block layer of a write reply before the said flush reply, > resulting in a write being considered part of the flush when in fact it > was not. The kernel (or any other user of SCSI/ATA/NVMe-like cache flushes) will wait for all I/O that needs to be in the cache for explicitly, so this is not a problem. > Can you clarify what you mean by that? Why is it an "odd flush > definition", and how would you "properly" define it? E.g. take the defintion from NVMe which also supports multiple queues: "The Flush command shall commit data and metadata associated with the specified namespace(s) to non-volatile media. The flush applies to all commands completed prior to the submission of the Flush command. The controller may also flush additional data and/or metadata from any namespace." The focus is completed - we need to get a reply to the host first before we can send the flush command, so anything that we require to be flushed needs to explicitly be completed first. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH][V3] nbd: add multi-connection support
On Mon, Oct 03, 2016 at 01:47:06AM +, Josef Bacik wrote: > It's not "broken", it's working as designed, and any fs on top of this patch > will be perfectly safe because they all wait for their io to complete before > issuing the FLUSH. If somebody wants to address the paranoid case later then > all the power to them, but this works for my use case and isn't inherently > broken. If it doesn't work for yours then don't use the feature, it's that > simple. Thanks, Let's take one step back here. I agree with Josef that sending one single flush is perfectly fine for all usual cases. The issue that was brought up last time we had this discussion was that some (I think mostly theoretical) backends could not be coherent and this would be an issue. So maybe the right way is to simply not support the current odd flush defintion in the kernel then and require a new properly defined flush version instead. -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 01:39:11PM +0100, Alex Bligh wrote: > That's probably right in the case of file-based back ends that > are running on a Linux OS. But gonbdserver for instance supports > (e.g.) Ceph based backends, where each connection might be talking > to a completely separate ceph node, and there may be no cache > consistency between connections. Yes, if you don't have a cache coherent backend you are generally screwed with a multiqueue protocol. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 01:33:20PM +0100, Alex Bligh wrote: > At an implementation level that is going to be a little difficult > for some NBD servers, e.g. ones that fork() a different process per > connection. There is in general no IPC to speak of between server > instances. Such servers would thus be unsafe with more than one > connection if FLUSH is in use. > > I believe such servers include the reference server where there is > process per connection (albeit possibly with several threads). > > Even single process servers (including mine - gonbdserver) would > require logic to pair up multiple connections to the same > device. Why? If you only send the completion after your I/O syscall returned your are fine if fsync comes from a difference process, no matter if you're using direct or buffered I/O underneath. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 02:21:20PM +0200, Wouter Verhelst wrote: > Right. So do I understand you correctly that blk-mq currently doesn't > look at multiple queues, and just assumes that if a FLUSH is sent over > any one of the queues, it applies to all queues? Yes. The same is true at the protocol level for NVMe or SCSI transports that can make use of multiple queues. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 02:01:59PM +0200, Wouter Verhelst wrote: > Yes. There was some discussion on that part, and we decided that setting > the flag doesn't hurt, but the spec also clarifies that using it on READ > does nothing, semantically. > > > The problem is that there are clients in the wild which do set it on > READ, so it's just a matter of "be liberal in what you accept". Note that FUA on READ in SCSI and NVMe does have a meaning - it requires you to bypass any sort of cache on the target. I think it's an wrong defintion because it mandates implementation details that aren't observable by the initiator, but it's still the spec wording and nbd diverges from it. That's not nessecarily a bad thing, but a caveat to look out for. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 01:11:24PM +0100, Alex Bligh wrote: > > NBD_CMD_FLUSH (3) > > > > A flush request; a write barrier. > > I can see that's potentially confusing as isn't meant to mean 'an old-style > linux kernel block device write barrier'. I think in general terms it > probably is some form of barrier, but I see no problem in deleting the > words "a write barrier" from the spec text if only to make it > clearer. Yes, please do that. A "barrier" implies draining of the queue. > However, I think the description of the command itself: > > > The server MUST NOT send a successful reply header for this request before > > all write requests for which a reply has already been sent to the client > > have reached permanent storage (using fsync() or similar). > > and the ordering section I pointed you to before, were both correct, yes? Yes, this seems correct. > actually fdatasync() technically does more than is necessary, as it > will also flush commands that have been processed, but for which no > reply has yet been sent - that's no bad thing. Yes. But without an actual barrier it's hard to be exact - and fdatasync does the right thing by including false positives. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 01:55:14PM +0200, Wouter Verhelst wrote: > Maybe I'm not using the correct terminology here. The point is that > after a FLUSH, the server asserts that all write commands *for which a > reply has already been sent to the client* will also have reached > permanent storage. Nothing is asserted about commands for which the > reply has not yet been sent, regardless of whether they were sent to the > server before or after the FLUSH; they may reach permanent storage as a > result of the FLUSH, or they may not, we don't say. > > With FUA, we only assert that the FUA-flagged command reaches permanent > storage before its reply is sent, nothing else. Yes, that's the correct semantics. > If that's not a write barrier, then I was using the wrong terminology > (and offer my apologies for the confusion). It's not a write barrier - a write barrier was command that ensured that a) all previous writes were completed to the host/client b) all previous writes were on non-volatile storage and c) the actual write with the barrier bit was on non-volatile storage > The point still remains that "X was sent before Y" is difficult to > determine on the client side if X was sent over a different TCP channel > than Y, because a packet might be dropped (requiring a retransmit) for > X, and similar things. If blk-mq can deal with that, we're good and > nothing further needs to be done. If not, this should be evaluated by > someone more familiar with the internals of the kernel block subsystem > than me. The important bit in all the existing protocols, and which Linux relies on is that any write the Linux block layer got a completion for earlier needs to be flushed out to non-volatile storage when a FLUSH command is set. Anything that still is in flight does not matter. Which for NBD means anything that you already replies to need to be flushed. Or to say it more practicly - in the nbd server you simply need to call fdatasync on the backing device or file whenever you get a FLUSH requires, and it will do the right thing. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 12:46:07PM +0100, Alex Bligh wrote: > Essentially NBD does supports FLUSH/FUA like this: > > https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt > > IE supports the same FLUSH/FUA primitives as other block drivers (AIUI). > > Link to protocol (per last email) here: > > https://github.com/yoe/nbd/blob/master/doc/proto.md#ordering-of-messages-and-writes Flush as defined by the Linux block layer (and supported that way in SCSI, ATA, NVMe) only requires to flush all already completed writes to non-volatile media. It does not impose any ordering unlike the nbd spec. FUA as defined by the Linux block layer (and supported that way in SCSI, ATA, NVMe) only requires the write operation the FUA bit is set on to be on non-volatile media before completing the write operation. It does not impose any ordering, which seems to match the nbd spec. Unlike the NBD spec Linux does not allow FUA to be set on anything by WRITE commands. Some other storage protocols allow a FUA bit on READ commands or other commands that write data to the device, though. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 12:43:35PM +0100, Alex Bligh wrote: > Sure, it's at: > > https://github.com/yoe/nbd/blob/master/doc/proto.md#ordering-of-messages-and-writes > > and that link takes you to the specific section. > > The treatment of FLUSH and FUA is meant to mirror exactly the > linux block layer (or rather how the linux block layer was a few > years ago). I even asked on LKML to verify a few points. Linux never expected ordering on the wire. Before 2010 we had barriers in the kernel that provided ordering to the caller, but we never required it from the protocol / hardware. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 12:09:28PM +0100, Alex Bligh wrote: > A more general point is that with multiple queues requests > may be processed in a different order even by those servers that > currently process the requests in strict order, or in something > similar to strict order. The server is permitted by the spec > (save as mandated by NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA) to > process commands out of order anyway, but I suspect this has > to date been little tested. The Linux kernel does not assume any synchroniztion between block I/O commands. So any sort of synchronization a protocol does is complete overkill for us. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
On Thu, Sep 15, 2016 at 12:49:35PM +0200, Wouter Verhelst wrote: > A while back, we spent quite some time defining the semantics of the > various commands in the face of the NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA > write barriers. At the time, we decided that it would be unreasonable > to expect servers to make these write barriers effective across > different connections. Do you have a nbd protocol specification? treating a flush or fua as any sort of barrier is incredibly stupid. Is it really documented that way, and if yes, why? -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH 5/5] nbd: add multi-connection support
Hi Josef, I haven't read the full path as I'm a bit in a hurry, but is there a good reason to not simply have a socket per-hw_ctx and store it in the hw_ctx private data instead of using the index in the nbd_cmd structure? -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH 2/5] nbd: Remove signal usage
Hi Markus, this looks great! Reviewed-by: Christoph Hellwig <h...@lst.de> One thing I noticed, which might be a good cleanup in the future: > - spin_lock_irqsave(>tasks_lock, flags); > nbd->task_recv = current; > - spin_unlock_irqrestore(>tasks_lock, flags); It seems like task_{send,recv} are only used for the files showing the pids. Maybe you should just store the pids directly in the nbd_device structure and also keep the sysfs file around for the life time of that structure. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH 1/1] kthread: introduce kthread_get_run() to fix __nbd_ioctl()
On Sun, Oct 25, 2015 at 03:27:13PM +0100, Oleg Nesterov wrote: > It is not safe to use the task_struct returned by kthread_run(threadfn) > if threadfn() can exit before the "owner" does kthread_stop(), nothing > protects this task_struct. > > So __nbd_ioctl() looks buggy; a killed nbd_thread_send() can exit, free > its task_struct, and then kthread_stop() can use the freed/reused memory. > > Add the new trivial helper, kthread_get_run(). Hopefully it will have more > users, this patch changes __nbd_ioctl() as an example. This looks horrible. I think the real problem is that nbd is totally abusing signals for kthreads and that needs to go away. -- ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH v3 0/7] block: reread partitions changes and fix for loop
This series looks good to me, Reviewed-by: Christoph Hellwig h...@lst.de -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH v2 0/7] block: reread partitions changes and fix for loop
The series looks fine to me: Reviewed-by: Christoph Hellwig h...@lst.de -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH 2/6] block: loop: don't hold lo_ctl_mutex in lo_open
On Mon, Apr 06, 2015 at 12:28:22AM +0800, Ming Lei wrote: Another simpler way is to make lo_refcnt as atomic_t and remove lo_ctrl_mutext in lo_open(), and freeze request queue during clearing fd, and better to freeze queue during setting fd too, so will update in v1 with this way. Using an atomic_t sounds good to me. -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general
Re: [Nbd] [PATCH 7/9] nbd: Remove fixme that was already fixed
On Thu, Apr 02, 2015 at 10:11:39AM +0200, Markus Pargmann wrote: +/* + * Forcibly shutdown the socket causing all listeners to error + */ static void sock_shutdown(struct nbd_device *nbd, int lock) { - /* Forcibly shutdown the socket causing all listeners - * to error - * - * FIXME: This code is duplicated from sys_shutdown, but - * there should be a more generic interface rather than - * calling socket ops directly here */ if (lock) mutex_lock(nbd-tx_lock); if (nbd-sock) { Please also kill the conditional locking here in a future patch by moving it into the caller. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ___ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general