Re: The lamentation of proplib(3)
On Wed, Jan 29, 2014 at 03:05:41AM +, Mindaugas Rasiukevicius wrote: In this case, the proplib implementation has such major deficiencies that replacing it itself is a virtue. Fixing is not the case here, as it would basically mean rewriting. Since this thread has died down again, here are some suggestions on how we should move forward: (1) It seems the consensus over the last 5+ years is that the library we want is a data transfer library, not a data storage library. If you want the latter, we have libdb and sqlite. The difference here is whether the data you're handling lives primarily in the library or if you normally maintain it in your own representation and stuff it into the library only for transfer. (And pull it out immediately after receive.) If anyone disagrees violently with this conclusion it's time to speak up. (2) We have not yet converged on a data model. The sane choices are, pretty much, (a) flat key-value store, (b) recursive tree of key-value stores, (c) relational tables, (d) property graph model, or *maybe* (e) rdf data model. The sense I have from the various discussions is that (a) isn't enough for at least some of the applications we have. My guess at this point is that we want (b). The chief advantage of the more elaborate data models is that they allow sane encoding of more complicated data; also they have, or can have, simpler transfer encodings because they *aren't* recursive. I don't think we have any immediate use for graph data, but I think we might in the future; e.g. save files for games, and also the device attachment tree in the kernel keeps threatening to turn into a graph. Similarly I don't see any immediate use for relational data, but we might in the future; in particular if we improve the nsswitch interface we might want to use a relational data transfer library to ship config data around. Inasmuch as Unix config data is tabular, and it isn't always key/value, a relational library would fit this case better. However, I think what we should do for the moment is go with (b), and if we want a relational or graph library later, we can write another library with a similar interface -- unlike the current proplib the new library is supposed to be small and simple, and I don't see that having as many as three of them with different purposes would constitute a problem. And if we want we can always build them so they share enough of the transfer format that they can read each other's simple forms. (3) It is clear that we need schema definitions and schema enforcement. I think this should be handled as follows: (let libfoo be the name of the replacement library) (a) the machine readable form of a libfoo schema should itself be a libfoo blob; (b) there should be a separate libfooschema library (or maybe libfoocheck, validate, whatever) that checks a libfoo blob against a libfoo schema blob; (c) there should be a human-readable DDL and code that reads it into a libfoo schema blob (maybe in libfooschema, maybe in a third library); (d) there should be a program that takes a libfoo schema (probably in the DDL form) and generates code that packs and unpacks libfoo blobs of this schema into real data structures; (e) there should maybe also be a program that does this and generates code that reads and writes the transfer format directly without materializing a libfoo blob, but we don't need this up front; All of this is reasonably straightforward, but it can't be written until we settle the other open issues. (4) We need to converge on an API. I think what we should do is: everyone who has ideas on the subject design one, or collaborate with someone else who's designing one. Keep talking, and everybody steal everybody else's best ideas. :-) That way we can probably converge on a small number of alternatives. I committed a copy of my 2008 jetlib strawman proposal to othersrc, but it needs some attention before it can be a serious candatate. The FreeBSD libnv is another candidate. (5) We need to converge on a transfer format. Nobody likes XML, so we should pick something else as the native form. I think we should seriously consider JSON and one of the saner RDF formats for text, and XDR and one or more ASN.1 protocols for binary, as well as perhaps others -- there is also a simple binary encoding in the jetlib code that we could adopt but it has nothing much to offer. And there are probably others. Nothing says we have to allow the full generality of whatever we choose, just as proplib doesn't allow general XML. Note that the new library has to be able to read and write compatible proplib XML bundles regardless, as standard compat obligations require that. It would be nice to be able to load transfer formats as plugins, including in the form of the library that gets compiled into the kernel. The jetlib code is structured so that this would be possible (but it doesn't directly
Re: The lamentation of proplib(3)
On Fri, Feb 07, 2014 at 06:28:59PM +, David Holland wrote: (1) It seems the consensus over the last 5+ years is that the library we want is a data transfer library, not a data storage library. If you want the latter, we have libdb and sqlite. Agreed, but please avoid db(3) :) (2) We have not yet converged on a data model. The sane choices are, pretty much, (a) flat key-value store, (b) recursive tree of key-value stores, (c) relational tables, (d) property graph model, or *maybe* (e) rdf data model. Typed, nested key-value. So basically (b). (3) It is clear that we need schema definitions and schema enforcement. Efficent ways to impose limits at parse time would also fit into this area. Saying I don't want input larger than 10KB is a very blunt way to avoid resource exhaustion as the typical problems are more granular. Joerg
Re: The lamentation of proplib(3)
On Tue, 28 Jan 2014 21:36:48 + Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote: libnv may be more type-safe as an API itself than proplib, but if we are going to seriously adopt something for formal protocols, it ought to have schemas that support enforcement in the C type system so that the protocols can have type-safe APIs too. I have thought for a long time that a limited form of reflection should be added to C and C ++ precisely to support serialization of simple data structures. I know we can't fix C here and now, nor even in time for what you propose. I'd like to ask, though, if you agree that the features I describe below would, were they available, be helpful to what you propose. From XDR onward, serialization has been a persistent nuisance (so to speak) because the compiler discards the information needed to provide type-checked I/O. Every C struct can be described by a list of tuples, basically offset, name, type, length and it *should* be possible to iterate over that list to perform I/O. All that is needed is for the compiler to capture that information in a table, and for the language to provide access to it. It's already being done, just badly. Some of the information, such as offset, is weirdly available via macros. The rest is provided to the debugger in some horrible nonstandard way that is inaccessible to the program. My question to you and the others participating in this thread is simple: Do you think such a table would both reduce the effort of writing I/O libraries and improve their correctness? I for one can imagine tying that information to writev/readv to programatically generate iovec arrays and validate type information at read time. It's not an idle question. It would be good to have a clear example use-case. --jkl
Re: The lamentation of proplib(3)
On Tue, Jan 28, 2014 at 06:44:57PM +, Mindaugas Rasiukevicius wrote: and my own dissatisfaction has reached the point where I decided to raise the question. The question of replacing proplib(3) with a better library. There were ideas by some developers to write a new library from scratch. The FreeBSD project has recently developed a general purpose key-value pair library, which is quite similar to nvpair library in Solaris. Isn't proplib(3) quite heavily used throughout the system, both kernel space and user space? It won't be a trivial task to fully make this change, is all I'm saying. I say don't get rid of proplib(3) entirely, how about moving it to pkgsrc at least? -Christian
Re: The lamentation of proplib(3)
Le 28/01/2014 19:44, Mindaugas Rasiukevicius a écrit : Hello, Hi, Many developers have been dissatisfied with proplib(3) for quite a while and my own dissatisfaction has reached the point where I decided to raise the question. The question of replacing proplib(3) with a better library. There were ideas by some developers to write a new library from scratch. The FreeBSD project has recently developed a general purpose key-value pair library, which is quite similar to nvpair library in Solaris. [snip] Discuss. [1] http://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014464.html [2] http://nxr.netbsd.org/xref/src-freebsd/lib/libnv/ [3] http://www.freebsd.org/cgi/man.cgi?query=nvmanpath=FreeBSD+11-current I agree on the ugliness/impracticality of proplib(3), and got myself bitten twice. So I am all for alternatives. Firstly, one clarification: is your intent to /replace/ proplib(3) or to /provide/ a simpler interchange format for userland/kernel, and keep proplib(3) in its place for historical purposes? Replacement ain't that easy, proplib(3) is used throughout the tree in multiple places, and they are parts where a full-scale replacement is not trivial (quotas for one). Secondly, they are tons of interchange formats out there. libnv is one more, with its original author stating that it is not really meant as a replacement for XML/JSON. To bring some momentum to the discussion, here are some remarks regarding libnv: - I like the idea of offering an easy way to pass forth file descriptors; - the error handling is weak IMHO; given the potential large use of such a library, it should support richer semantics than a blunt errno (something equivalent to a gai_strerror(3) maybe); - it does not seem to offer a way to serialize kernel shared structures easily. It is particularly convenient to have because it avoids user/kernel roundtrips when you want to expose kernel structures without syscall overhead (instead of playing with ioctl or low-level mmap). Why did they consider rolling out libnv when there are alternatives like protocol buffers or thrift? Granted, those tools are meant for higher level langages and RPCs, but if NetBSD managed to use XML in kernel, I suppose those would fit too... Cheers, -- Jean-Yves Migeon
Re: The lamentation of proplib(3)
Le 28/01/2014 19:44, Mindaugas Rasiukevicius a écrit : [...] - Last but not least, it does not have awkward API naming, such as prop_data_create_data_nocopy() or prop_number_unsigned_integer_value(). I particularly agree on this one, hehe
Re: The lamentation of proplib(3)
On Jan 28, 7:40pm, Christian Koch wrote: } On Tue, Jan 28, 2014 at 06:44:57PM +, Mindaugas Rasiukevicius wrote: } and my own dissatisfaction has reached the point where I decided to raise } the question. The question of replacing proplib(3) with a better library. } There were ideas by some developers to write a new library from scratch. } The FreeBSD project has recently developed a general purpose key-value pair } library, which is quite similar to nvpair library in Solaris. } } Isn't proplib(3) quite heavily used throughout the system, both } kernel space and user space? It won't be a trivial task to fully It is. } make this change, is all I'm saying. Definitely. Also, nvlist doesn't address one of the significant uses of proplib. } I say don't get rid of proplib(3) entirely, how about moving it } to pkgsrc at least? Something that is heavily used throughout the system can not be moved to pkgsrc. Pkgsrc is an addon, not part of the base system. Thus nothing in the base system can be dependent upon pkgsrc to function. }-- End of excerpt from Christian Koch
Re: The lamentation of proplib(3)
On Tue, Jan 28, 2014 at 09:36:48PM +, Taylor R Campbell wrote: I'm inclined to say we ought to use protocol buffers -- it supports C-enforceable schemas, has been widely adopted in the world, and satisfies more or less all your desiderata. Parts of the wire format are a little wacky, but whatever. The only trouble is we'd have to write a C-only implementation, but that shouldn't be too hard. What do you mean by C-only? Even though Google's standard compiler generates C++ there is to-pure-C compiler [1]. If you are referring to the language used to implement the compiler then this is not an option as the compiler is implemented in C++. [1] https://code.google.com/p/protobuf-c/
Re: The lamentation of proplib(3)
Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote: I don't think there's much disagreement that proplib is wrong, but a proposal to replace it ought to include concrete examples of how current uses of proplib (or C structs or other wire data transmission formats) should be replaced, not just general murmuring that proplib sucks and there are better options. That said... If you have taken a look at the manual page and the examples section, the API is straightforward. I do not think that we all need to hold our hands and learn together that prop_dictionary_create() would be replaced with nvlist_create(), prop_dictionary_set_uint64() with nvlist_add_numer() and so on. The idea is to move to a different (just sane) implementation, not a different concept. I'm inclined to say we ought to use protocol buffers -- it supports C-enforceable schemas, has been widely adopted in the world, and satisfies more or less all your desiderata. Parts of the wire format are a little wacky, but whatever. The only trouble is we'd have to write a C-only implementation, but that shouldn't be too hard. I am not against such approach per se, but our needs (given the cases in NetBSD tree) are quite lower than what Protocol Buffers were designed for. Most use cases in our tree are for data transfers between the user and the kernel (just as a more convenient alternative to ioctls + structs and kmem grovellers). How do you imagine conversion of the existing proplib uses? I doubt it is a realistic choice for this purpose. In time, XDR was not adopted for this purpose either. -- Mindaugas
Re: The lamentation of proplib(3)
Le 28/01/2014 22:16, Mindaugas Rasiukevicius a écrit : The long term objective would be to replace and eliminate proplib(3) from the tree. The short to medium term objective is to provide an alternative, start using it and gradually convert proplib uses. Yes, we will need to add compatibility code for the Property List format, which is going to be very depressive. Nobody said it is going to be a trivial task. The riddance is not going to happen any time soon. We just have to start somewhere. Indeed. I think it is better of leaving proplib(3) as it is, it will go out by itself when subsystems are updated on the long run. Secondly, they are tons of interchange formats out there. libnv is one more, with its original author stating that it is not really meant as a replacement for XML/JSON. The library provides an interface to pack and transport the data. As far as the caller is concerned, it does not matter what serialisation format it uses. That's the purpose of the lib. However I disagree for the caller: it does matter, indirectly. A horribly inefficient serialization means that the lib will not get widespread use. Besides if the serialization format has limitations (no nesting allowed, key unicity, ...), it cannot replace proplib(3) 1:1. There is no reason why it could not use JSON or insert your favourite format. I think the default format should be binary, though. I agree. - the error handling is weak IMHO; given the potential large use of such a library, it should support richer semantics than a blunt errno (something equivalent to a gai_strerror(3) maybe); Why? Most of the use cases in our tree do not really need granularity on errors - you either retrieve (or construct) the whole thing or you fail. Well, you have to know /why/ it failed when you construct it. EINVAL is not really informative: duplicate key, depth limit, out of memory, out of bound (for string or int encoding)... That is why accumulated error is so useful, it would simplify many cases in our tree. If we add support for schemas, then the schema validation code is the routine which could be more informative. I cannot see how nvlist_error() can carry this information. How is the API supposed to inform the caller that the schema validation code is wrong and not nvl? - it does not seem to offer a way to serialize kernel shared structures easily. It is particularly convenient to have because it avoids user/kernel roundtrips when you want to expose kernel structures without syscall overhead (instead of playing with ioctl or low-level mmap). Can you be more specific? This is probably badly expressed on my part. Two things: 1 - I was thinking about sysctl. *stat(8) binaries use sysctl(3) to query about structures (io_sysctl, clockinfo, ...) that get shared between userland and kernel. There is no reflection here, the caller has to use the correct structure if it wants to get the proper decoding. Else it ends badly. An interchange format has to detect decoding mismatches, especially when they pose security/integrity issues (information leak, out of bound values). 2 - regular polling of statistics Following the sysctl example, in the case of top (but any other stat would do: sysstat, iostat, netstat, ...) the values are regularly updated by copying them from kernel back in userland. I have met from time to time system-specific APIs to map such values in userland read-only, to avoid pinging back the kernel for their update (Xen iorings, L4 flexpage, can't remember for the others), but there was no library to manipulate them through a higher level interface (for example when you want to pass driver hardware counters). So I had to roll my own. Not difficult when you access atomic-friendly values (integers and such), less so about strings or objects. Why did they consider rolling out libnv when there are alternatives like protocol buffers or thrift? Granted, those tools are meant for higher level langages and RPCs, but if NetBSD managed to use XML in kernel, I suppose those would fit too... Google protocol buffers and Apache Thrift work in a different way - they generate the code for you based on a provided schema, to conveniently and efficiently implement RPCs. It is XDR on steroids - 1980s technology refurbished for the modern day (e.g. including some schema versioning, compression, a bunch of tools, etc). The libraries we are talking about merely perform dynamic data serialisation at run-time. Both approaches have their merits, but for all intents and purposes we are not going to shift to a different approach, or rather paradigm, at this point. I have no experience there. They bring interesting properties though: compiler checks for the API, optimizations (you can get really compact, efficient structures when you specify upper/lower bounds), and are suitable for RPC. Can become an interesting property for async communications. Also, proplib uses horrible
Re: The lamentation of proplib(3)
Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote: If you have taken a look at the manual page and the examples section, the API is straightforward. I do not think that we all need to hold our hands and learn together that prop_dictionary_create() would be replaced with nvlist_create(), prop_dictionary_set_uint64() with nvlist_add_numer() and so on. The idea is to move to a different (just sane) implementation, not a different concept. Then what's the benefit? In this case, the proplib implementation has such major deficiencies that replacing it itself is a virtue. Fixing is not the case here, as it would basically mean rewriting. Switching to nvlist sounds like a step back because it doesn't support nested aggregate structures like are used in hdaudio, envsys, drvctl, c., and in spite of the hassle to convert everything to nvlist, for compatibility's sake we'd have to keep all the proplib support for a while anyway. It does support nesting, see nvlist_add_nvlist(). I am not against such approach per se, but our needs (given the cases in NetBSD tree) are quite lower than what Protocol Buffers were designed for. Most use cases in our tree are for data transfers between the user and the kernel (just as a more convenient alternative to ioctls + structs and kmem grovellers). How do you imagine conversion of the existing proplib uses? I doubt it is a realistic choice for this purpose. In time, XDR was not adopted for this purpose either. We identify the data structures in each case, write down the schema, convert prop_dictionary_set_uint8(msg, xyz, 32) to msg-xyz = 32, replace prop_dictionary_copyin_ioctl by protobuf_copyin_ioctl, c. The only substantial difference between proplib (or xdr) and protobuf for this concern is whether the schema is formally written down and checked by the compiler -- the data types are all basically the same. Which is what XDR does, except it is quite dated by now. My point was that the community did not really adapt such approach for the purpose of user-kernel communication and I am bit sceptic that it will, especially when it is even more work than migrating to libnv(3). However, if you want to prototype a lightweight C implementation of Protocol Buffers and propose it here for a discussion - why not. I would prototype XDR 2.0, but it is just a matter of taste. :) -- Mindaugas
Re: The lamentation of proplib(3)
On Tue, Jan 28, 2014 at 09:02:43PM +0100, Jean-Yves Migeon wrote: Replacement ain't that easy, proplib(3) is used throughout the tree in multiple places, and they are parts where a full-scale replacement is not trivial (quotas for one). Quotas don't use proplib. All the quota proplib stuff was ripped out before -6 was branched. Judging by how much work that was... ripping proplib out of everything else would be slow going. But in many of the other cases, we don't really need to rip out proplib entirely and redo everything, just change over to whatever new simpler library we end up with. The new library *does* need to be able to handle proplib xml blobs, at least as compat code. -- David A. Holland dholl...@netbsd.org