Re: The lamentation of proplib(3)

2014-02-07 Thread David Holland
On Wed, Jan 29, 2014 at 03:05:41AM +, Mindaugas Rasiukevicius wrote:
  In this case, the proplib implementation has such major deficiencies that
  replacing it itself is a virtue.  Fixing is not the case here, as it would
  basically mean rewriting.

Since this thread has died down again, here are some suggestions on
how we should move forward:

(1) It seems the consensus over the last 5+ years is that the library
we want is a data transfer library, not a data storage library. If you
want the latter, we have libdb and sqlite. The difference here is
whether the data you're handling lives primarily in the library or if
you normally maintain it in your own representation and stuff it into
the library only for transfer. (And pull it out immediately after
receive.)

If anyone disagrees violently with this conclusion it's time to speak
up.

(2) We have not yet converged on a data model. The sane choices are,
pretty much, (a) flat key-value store, (b) recursive tree of key-value
stores, (c) relational tables, (d) property graph model, or *maybe*
(e) rdf data model.

The sense I have from the various discussions is that (a) isn't enough
for at least some of the applications we have. My guess at this point
is that we want (b). The chief advantage of the more elaborate data
models is that they allow sane encoding of more complicated data; also
they have, or can have, simpler transfer encodings because they
*aren't* recursive.

I don't think we have any immediate use for graph data, but I think we
might in the future; e.g. save files for games, and also the device
attachment tree in the kernel keeps threatening to turn into a graph.

Similarly I don't see any immediate use for relational data, but we
might in the future; in particular if we improve the nsswitch
interface we might want to use a relational data transfer library to
ship config data around. Inasmuch as Unix config data is tabular, and
it isn't always key/value, a relational library would fit this case
better.

However, I think what we should do for the moment is go with (b), and
if we want a relational or graph library later, we can write another
library with a similar interface -- unlike the current proplib the new
library is supposed to be small and simple, and I don't see that
having as many as three of them with different purposes would
constitute a problem. And if we want we can always build them so they
share enough of the transfer format that they can read each other's
simple forms.

(3) It is clear that we need schema definitions and schema
enforcement. I think this should be handled as follows:
(let libfoo be the name of the replacement library)
   (a) the machine readable form of a libfoo schema should itself be a
   libfoo blob;
   (b) there should be a separate libfooschema library (or maybe
   libfoocheck, validate, whatever) that checks a libfoo blob
   against a libfoo schema blob;
   (c) there should be a human-readable DDL and code that reads it
   into a libfoo schema blob (maybe in libfooschema, maybe in a
   third library);
   (d) there should be a program that takes a libfoo schema (probably
   in the DDL form) and generates code that packs and unpacks
   libfoo blobs of this schema into real data structures;
   (e) there should maybe also be a program that does this and
   generates code that reads and writes the transfer format
   directly without materializing a libfoo blob, but we don't need
   this up front;

All of this is reasonably straightforward, but it can't be written
until we settle the other open issues.

(4) We need to converge on an API. I think what we should do is:
everyone who has ideas on the subject design one, or collaborate with
someone else who's designing one. Keep talking, and everybody steal
everybody else's best ideas. :-) That way we can probably converge on
a small number of alternatives.

I committed a copy of my 2008 jetlib strawman proposal to othersrc,
but it needs some attention before it can be a serious candatate. The
FreeBSD libnv is another candidate.

(5) We need to converge on a transfer format. Nobody likes XML, so we
should pick something else as the native form. I think we should
seriously consider JSON and one of the saner RDF formats for text, and
XDR and one or more ASN.1 protocols for binary, as well as perhaps
others -- there is also a simple binary encoding in the jetlib code
that we could adopt but it has nothing much to offer. And there are
probably others. Nothing says we have to allow the full generality of
whatever we choose, just as proplib doesn't allow general XML.

Note that the new library has to be able to read and write compatible
proplib XML bundles regardless, as standard compat obligations require
that.

It would be nice to be able to load transfer formats as plugins,
including in the form of the library that gets compiled into the
kernel. The jetlib code is structured so that this would be possible
(but it doesn't directly 

Re: The lamentation of proplib(3)

2014-02-07 Thread Joerg Sonnenberger
On Fri, Feb 07, 2014 at 06:28:59PM +, David Holland wrote:
 
 (1) It seems the consensus over the last 5+ years is that the library
 we want is a data transfer library, not a data storage library. If you
 want the latter, we have libdb and sqlite.

Agreed, but please avoid db(3) :)

 (2) We have not yet converged on a data model. The sane choices are,
 pretty much, (a) flat key-value store, (b) recursive tree of key-value
 stores, (c) relational tables, (d) property graph model, or *maybe*
 (e) rdf data model.

Typed, nested key-value. So basically (b).

 (3) It is clear that we need schema definitions and schema
 enforcement.

Efficent ways to impose limits at parse time would also fit into this
area. Saying I don't want input larger than 10KB is a very blunt way
to avoid resource exhaustion as the typical problems are more granular.

Joerg


Re: The lamentation of proplib(3)

2014-01-29 Thread James K. Lowden
On Tue, 28 Jan 2014 21:36:48 +
Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote:

 libnv may be more type-safe as an API itself than proplib, but if we
 are going to seriously adopt something for formal protocols, it ought
 to have schemas that support enforcement in the C type system so that
 the protocols can have type-safe APIs too.

I have thought for a long time that a limited form of reflection should
be added to C and C ++ precisely to support serialization of simple
data structures.  I know we can't fix C here and now, nor even in time
for what you propose.  I'd like to ask, though, if you agree that the
features I describe below would, were they available, be helpful to
what you propose.  

From XDR onward, serialization has been a persistent nuisance (so to
speak) because the compiler discards the information needed to provide
type-checked I/O.   

Every C struct can be described by a list of tuples, basically 

offset, name, type, length

and it *should* be possible to iterate over that list to perform I/O.
All that is needed is for the compiler to capture that information in a
table, and for the language to provide access to it.  

It's already being done, just badly.  Some of the information, such as
offset, is weirdly available via macros.  The rest is provided to the
debugger in some horrible nonstandard way that is inaccessible to the
program.  

My question to you and the others participating in this thread is
simple: Do you think such a table would both reduce the effort of
writing I/O libraries and improve their correctness?  I for one can
imagine tying that information to writev/readv to programatically
generate iovec arrays and validate type information at read time.  

It's not an idle question.  It would be good to have a clear example
use-case.  

--jkl










Re: The lamentation of proplib(3)

2014-01-28 Thread Christian Koch
On Tue, Jan 28, 2014 at 06:44:57PM +, Mindaugas Rasiukevicius wrote:
 and my own dissatisfaction has reached the point where I decided to raise
 the question.  The question of replacing proplib(3) with a better library.
 There were ideas by some developers to write a new library from scratch.
 The FreeBSD project has recently developed a general purpose key-value pair
 library, which is quite similar to nvpair library in Solaris.

Isn't proplib(3) quite heavily used throughout the system, both kernel space and
user space? It won't be a trivial task to fully make this change, is all I'm
saying.

I say don't get rid of proplib(3) entirely, how about moving it to pkgsrc at
least?

-Christian


Re: The lamentation of proplib(3)

2014-01-28 Thread Jean-Yves Migeon

Le 28/01/2014 19:44, Mindaugas Rasiukevicius a écrit :

Hello,


Hi,


Many developers have been dissatisfied with proplib(3) for quite a while
and my own dissatisfaction has reached the point where I decided to raise
the question.  The question of replacing proplib(3) with a better library.
There were ideas by some developers to write a new library from scratch.
The FreeBSD project has recently developed a general purpose key-value pair
library, which is quite similar to nvpair library in Solaris.

[snip]



Discuss.

[1] http://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014464.html
[2] http://nxr.netbsd.org/xref/src-freebsd/lib/libnv/
[3] http://www.freebsd.org/cgi/man.cgi?query=nvmanpath=FreeBSD+11-current


I agree on the ugliness/impracticality of proplib(3), and got myself 
bitten twice. So I am all for alternatives.


Firstly, one clarification: is your intent to /replace/ proplib(3) or to 
/provide/ a simpler interchange format for userland/kernel, and keep 
proplib(3) in its place for historical purposes?


Replacement ain't that easy, proplib(3) is used throughout the tree in 
multiple places, and they are parts where a full-scale replacement is 
not trivial (quotas for one).


Secondly, they are tons of interchange formats out there. libnv is one 
more, with its original author stating that it is not really meant as a 
replacement for XML/JSON.


To bring some momentum to the discussion, here are some remarks 
regarding libnv:

- I like the idea of offering an easy way to pass forth file descriptors;
- the error handling is weak IMHO; given the potential large use of such 
a library, it should support richer semantics than a blunt errno 
(something equivalent to a gai_strerror(3) maybe);
- it does not seem to offer a way to serialize kernel shared structures 
easily. It is particularly convenient to have because it avoids 
user/kernel roundtrips when you want to expose kernel structures without 
syscall overhead (instead of playing with ioctl or low-level mmap).


Why did they consider rolling out libnv when there are alternatives like 
protocol buffers or thrift? Granted, those tools are meant for higher 
level langages and RPCs, but if NetBSD managed to use XML in kernel, I 
suppose those would fit too...


Cheers,

--
Jean-Yves Migeon


Re: The lamentation of proplib(3)

2014-01-28 Thread Maxime Villard
Le 28/01/2014 19:44, Mindaugas Rasiukevicius a écrit :
 [...]
 
 - Last but not least, it does not have awkward API naming, such as
   prop_data_create_data_nocopy() or prop_number_unsigned_integer_value().

I particularly agree on this one, hehe




Re: The lamentation of proplib(3)

2014-01-28 Thread John Nemeth
On Jan 28,  7:40pm, Christian Koch wrote:
} On Tue, Jan 28, 2014 at 06:44:57PM +, Mindaugas Rasiukevicius wrote:
}  and my own dissatisfaction has reached the point where I decided to raise
}  the question.  The question of replacing proplib(3) with a better library.
}  There were ideas by some developers to write a new library from scratch.
}  The FreeBSD project has recently developed a general purpose key-value pair
}  library, which is quite similar to nvpair library in Solaris.
} 
} Isn't proplib(3) quite heavily used throughout the system, both
} kernel space and user space?  It won't be a trivial task to fully

 It is.

} make this change, is all I'm saying.

 Definitely.  Also, nvlist doesn't address one of the significant
uses of proplib.

} I say don't get rid of proplib(3) entirely, how about moving it
} to pkgsrc at least?

 Something that is heavily used throughout the system can not
be moved to pkgsrc.  Pkgsrc is an addon, not part of the base
system.  Thus nothing in the base system can be dependent upon
pkgsrc to function.

}-- End of excerpt from Christian Koch


Re: The lamentation of proplib(3)

2014-01-28 Thread Matthias Kretschmer
On Tue, Jan 28, 2014 at 09:36:48PM +, Taylor R Campbell wrote:
 I'm inclined to say we ought to use protocol buffers -- it supports
 C-enforceable schemas, has been widely adopted in the world, and
 satisfies more or less all your desiderata.  Parts of the wire format
 are a little wacky, but whatever.  The only trouble is we'd have to
 write a C-only implementation, but that shouldn't be too hard.

What do you mean by C-only?  Even though Google's standard
compiler generates C++ there is to-pure-C compiler [1].  If you
are referring to the language used to implement the compiler
then this is not an option as the compiler is implemented in
C++.

[1] https://code.google.com/p/protobuf-c/


Re: The lamentation of proplib(3)

2014-01-28 Thread Mindaugas Rasiukevicius
Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote:
 I don't think there's much disagreement that proplib is wrong, but a
 proposal to replace it ought to include concrete examples of how
 current uses of proplib (or C structs or other wire data transmission
 formats) should be replaced, not just general murmuring that proplib
 sucks and there are better options.  That said...

If you have taken a look at the manual page and the examples section,
the API is straightforward.  I do not think that we all need to hold our
hands and learn together that prop_dictionary_create() would be replaced
with nvlist_create(), prop_dictionary_set_uint64() with nvlist_add_numer()
and so on.  The idea is to move to a different (just sane) implementation,
not a different concept.

 I'm inclined to say we ought to use protocol buffers -- it supports
 C-enforceable schemas, has been widely adopted in the world, and
 satisfies more or less all your desiderata.  Parts of the wire format
 are a little wacky, but whatever.  The only trouble is we'd have to
 write a C-only implementation, but that shouldn't be too hard.

I am not against such approach per se, but our needs (given the cases in
NetBSD tree) are quite lower than what Protocol Buffers were designed for.
Most use cases in our tree are for data transfers between the user and the
kernel (just as a more convenient alternative to ioctls + structs and kmem
grovellers).  How do you imagine conversion of the existing proplib uses?
I doubt it is a realistic choice for this purpose.  In time, XDR was not
adopted for this purpose either.

-- 
Mindaugas


Re: The lamentation of proplib(3)

2014-01-28 Thread Jean-Yves Migeon

Le 28/01/2014 22:16, Mindaugas Rasiukevicius a écrit :

The long term objective would be to replace and eliminate proplib(3) from
the tree.  The short to medium term objective is to provide an alternative,
start using it and gradually convert proplib uses.  Yes, we will need to
add compatibility code for the Property List format, which is going to be
very depressive.

Nobody said it is going to be a trivial task.  The riddance is not going
to happen any time soon.  We just have to start somewhere.


Indeed. I think it is better of leaving proplib(3) as it is, it will go 
out by itself when subsystems are updated on the long run.



Secondly, they are tons of interchange formats out there. libnv is one
more, with its original author stating that it is not really meant as a
replacement for XML/JSON.


The library provides an interface to pack and transport the data.  As far
as the caller is concerned, it does not matter what serialisation format
it uses.


That's the purpose of the lib.

However I disagree for the caller: it does matter, indirectly. A 
horribly inefficient serialization means that the lib will not get 
widespread use.


Besides if the serialization format has limitations (no nesting allowed, 
key unicity, ...), it cannot replace proplib(3) 1:1.



 There is no reason why it could not use JSON or insert your
favourite format.  I think the default format should be binary, though.


I agree.


- the error handling is weak IMHO; given the potential large use of such
a library, it should support richer semantics than a blunt errno
(something equivalent to a gai_strerror(3) maybe);


Why?  Most of the use cases in our tree do not really need granularity on
errors - you either retrieve (or construct) the whole thing or you fail.


Well, you have to know /why/ it failed when you construct it. EINVAL is 
not really informative: duplicate key, depth limit, out of memory, out 
of bound (for string or int encoding)...



That is why accumulated error is so useful, it would simplify many cases
in our tree.  If we add support for schemas, then the schema validation
code is the routine which could be more informative.


I cannot see how nvlist_error() can carry this information. How is the 
API supposed to inform the caller that the schema validation code is 
wrong and not nvl?



- it does not seem to offer a way to serialize kernel shared structures
easily. It is particularly convenient to have because it avoids
user/kernel roundtrips when you want to expose kernel structures without
syscall overhead (instead of playing with ioctl or low-level mmap).


Can you be more specific?


This is probably badly expressed on my part. Two things:

1 - I was thinking about sysctl.

*stat(8) binaries use sysctl(3) to query about structures (io_sysctl, 
clockinfo, ...) that get shared between userland and kernel. There is no 
reflection here, the caller has to use the correct structure if it wants 
to get the proper decoding. Else it ends badly.


An interchange format has to detect decoding mismatches, especially when 
they pose security/integrity issues (information leak, out of bound values).


2 - regular polling of statistics

Following the sysctl example, in the case of top (but any other stat 
would do: sysstat, iostat, netstat, ...) the values are regularly 
updated by copying them from kernel back in userland.


I have met from time to time system-specific APIs to map such values in 
userland read-only, to avoid pinging back the kernel for their update 
(Xen iorings, L4 flexpage, can't remember for the others), but there was 
no library to manipulate them through a higher level interface (for 
example when you want to pass driver hardware counters). So I had to 
roll my own. Not difficult when you access atomic-friendly values 
(integers and such), less so about strings or objects.



Why did they consider rolling out libnv when there are alternatives like
protocol buffers or thrift? Granted, those tools are meant for higher
level langages and RPCs, but if NetBSD managed to use XML in kernel, I
suppose those would fit too...


Google protocol buffers and Apache Thrift work in a different way - they
generate the code for you based on a provided schema, to conveniently and
efficiently implement RPCs.  It is XDR on steroids - 1980s technology
refurbished for the modern day (e.g. including some schema versioning,
compression, a bunch of tools, etc).  The libraries we are talking about
merely perform dynamic data serialisation at run-time.  Both approaches
have their merits, but for all intents and purposes we are not going to
shift to a different approach, or rather paradigm, at this point.


I have no experience there.

They bring interesting properties though: compiler checks for the API, 
optimizations (you can get really compact, efficient structures when you 
specify upper/lower bounds), and are suitable for RPC. Can become an 
interesting property for async communications.



Also, proplib uses horrible 

Re: The lamentation of proplib(3)

2014-01-28 Thread Mindaugas Rasiukevicius
Taylor R Campbell campbell+netbsd-tech-k...@mumble.net wrote:
If you have taken a look at the manual page and the examples section,
the API is straightforward.  I do not think that we all need to hold
 our hands and learn together that prop_dictionary_create() would be
 replaced with nvlist_create(), prop_dictionary_set_uint64() with
 nvlist_add_numer() and so on.  The idea is to move to a different (just
 sane) implementation, not a different concept.
 
 Then what's the benefit?

In this case, the proplib implementation has such major deficiencies that
replacing it itself is a virtue.  Fixing is not the case here, as it would
basically mean rewriting.

 Switching to nvlist sounds like a step back
 because it doesn't support nested aggregate structures like are used
 in hdaudio, envsys, drvctl, c., and in spite of the hassle to convert
 everything to nvlist, for compatibility's sake we'd have to keep all
 the proplib support for a while anyway.

It does support nesting, see nvlist_add_nvlist().

   I am not against such approach per se, but our needs (given the cases
 in NetBSD tree) are quite lower than what Protocol Buffers were designed
 for. Most use cases in our tree are for data transfers between the user
 and the kernel (just as a more convenient alternative to ioctls + structs
 and kmem grovellers).  How do you imagine conversion of the existing
 proplib uses? I doubt it is a realistic choice for this purpose.  In
 time, XDR was not adopted for this purpose either.
 
 We identify the data structures in each case, write down the schema,
 convert prop_dictionary_set_uint8(msg, xyz, 32) to msg-xyz = 32,
 replace prop_dictionary_copyin_ioctl by protobuf_copyin_ioctl, c.
 The only substantial difference between proplib (or xdr) and protobuf
 for this concern is whether the schema is formally written down and
 checked by the compiler -- the data types are all basically the same.

Which is what XDR does, except it is quite dated by now.  My point was
that the community did not really adapt such approach for the purpose of
user-kernel communication and I am bit sceptic that it will, especially
when it is even more work than migrating to libnv(3).  However, if you
want to prototype a lightweight C implementation of Protocol Buffers and
propose it here for a discussion - why not.  I would prototype XDR 2.0,
but it is just a matter of taste. :)

-- 
Mindaugas


Re: The lamentation of proplib(3)

2014-01-28 Thread David Holland
On Tue, Jan 28, 2014 at 09:02:43PM +0100, Jean-Yves Migeon wrote:
  Replacement ain't that easy, proplib(3) is used throughout the tree
  in multiple places, and they are parts where a full-scale
  replacement is not trivial (quotas for one).

Quotas don't use proplib. All the quota proplib stuff was ripped out
before -6 was branched.

Judging by how much work that was... ripping proplib out of everything
else would be slow going. But in many of the other cases, we don't
really need to rip out proplib entirely and redo everything, just
change over to whatever new simpler library we end up with.

The new library *does* need to be able to handle proplib xml blobs, at
least as compat code.

-- 
David A. Holland
dholl...@netbsd.org