Re: [boost] Re: Sockets - what's the latest?
On Wed, Feb 12, 2003 at 06:11:59PM -0500, Jason House wrote: Once I heard there was a generic socket library in development, I thought I'd add a quick feature request. I would like to see the ability to have multiple streams through the same socket. This boils down to providing two distinct benefits. 1: Programs can easily perform complex communications over a single port. 2: Without multiple streams, problems can occur when there are multiple clients behind a proxy connecting to a host outside of the proxy. If the client only forms a single connection to the host, there won't be a problem because the random source port will differentiate each stream. So, when multiple clients connect to a host from behind a proxy, the host can only differentiate each stream by the random source port. So, when the clients form a second connection to the host, each stream gets differentiated from each other, but there is no mapping of random source port ot the distinct client. What you are asking for here is: http://chorus.sourceforge.net/ ... in my slightly biased opinion. ;-) This gives you a stackable transport framework which includes (among other things) a multiplexed channel. This project is still really new, but is shaping up nicely. However, it has no relation to boost and is primarily targetted at building distributed hash tables (p2p apps) that can go through any data channel. --- Wes ___ Unsubscribe other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] Serialization XML (was Serialization Library
Since no-one seemed to notice my prior post which I think addressed some of these issues; I am reattaching it here. On Thu, Nov 21, 2002 at 07:45:55AM -0800, Robert Ramey wrote: My question is whether XML can capture an arbitrary C++ structure in a meaningful and useful way. So far no one has presented any XML that captures that one proposed example. I did. Well, I don't know that. In general it is extremely difficult to know ahead of time what facilities a serialization library would need to be permit an XML archive to be generated. One would have to take a the library, make changes necessary to provide the desired result and check to see what changes are necessary. You will not need any hooks; to fully bracket the data, you can use a type-conversion trick made concrete below. * Some approaches, including XML, allow a practically unlimited number of different ways to represent the same data. The user rather than the serialization library should choose the particular design. XSLT will allow this. As long as the serialization library can output to SOME form of useful XML (such as the hierarchical format I propose), the mapping between any particular schema and this format can be done as a relatively straight-forward stylesheet. In the current system the following concepts are orthogonal a) The description of the which data should be saved for each class (save/load/version) b) composition of the above to handle arbitrary C++ data structures (serialization) c) description of how fundamental types should be encoded as a byte stream into a storage medium (archive) Assuming that the questions in my Thought experiment could be answered in the afirmative. What would have to be added to this system to permit it to handle XML. Another concept has to be added - that of reflection. A useful XML representation needs the name of the variable. So some system needs to be designed to hold that information and keep it related to each serializable member. Presumably this would be a orthogonal concept d) Yes; I had proposed in an earlier email a seperate serializor which included the name strings: return o bar bar foo foo ... This would provide the needed names to the system. The trick below provides the required hierachical information. The XSLT provides user-customizable formats. Your existing system for bases classes can do the diamond work. Alternately, the normal streamer could be adapated to take these names by default, and ignore them simpler data streams. A clever use of macros might also make this automatic. Given this, without too much effort and maybe adding some virtual functions to archive one could add begin/end tags to archive. Of course many would object to this on efficiency grounds but it would be possible. But things start to appear. What about versioning? where does that fit into XML? But what about pointers, inheritance, etc. to properly capture this in XML one would have to start altering b) . Its the automatic composition that guarentees that this system can serialize/deserialize any C++ structure. I doubt this would be worth it. I think that these are non-issues. Write them into the hierarchy in whatever way is most convenient and still restorable. The user needs to decide how he wants to represent these things himself in the style-sheet. When deserializing, another style-sheet should fill in whatever extras the user dropped from the output XML; such as version numbers, etc. Of course, anyone is free to the the current serialization system and experiment to see what it would really take to accomodate XML. (After all, its should be easy if I'm wrong). But won't be me. Here is a start: (but also the end of my contribution; I just thought this type conversion was a neat trick that someone might want to use) --- // Example begins // Compiles and works with g++-2.95.4 #include iostream using namespace std; // Common Framework class object_stream; class streamer { protected: object_stream* m_impl; streamer(object_stream* stream) : m_impl(stream) { } public: template class T object_stream operator (const T x); friend class object_stream; }; class object_stream { protected: streamer m_helper; virtual void object_begin() = 0; virtual void object_end () = 0; public: object_stream() : m_helper(this) { } virtual ~object_stream() { } operator streamer () { // Casted on return from method object_end(); return m_helper; } // All fundamental types go here virtual object_stream operator (int x) = 0; // This catches all non-fundamental types and safely preserves // our type information while calling template class T
[boost] Bracketing a stream (was: Serialization to relational table)
On Tue, Nov 19, 2002 at 06:07:23PM +0100, Wesley W. Terpstra wrote: The trick is to use the FUNCTION boundary of the serializor. snip code I have attached a working proto-type. This is merely proof of concept; I am not sure whether one should bracket fundamental types for instance. The output is presently: [ 1 [ 2 ] 3 4 [ 5 ] ] but maybe should be: [ [1] [ [2] ] [3] [4] [ [5] ] ] What do people think? I am certain someone smarter than I could make this even more clever. --- Wes // Example begins // Compiles and works with g++-2.95.4 #include iostream using namespace std; // Common Framework class object_stream; class streamer { protected: object_stream* m_impl; streamer(object_stream* stream) : m_impl(stream) { } public: template class T object_stream operator (const T x); friend class object_stream; }; class object_stream { protected: streamer m_helper; virtual void object_begin() = 0; virtual void object_end () = 0; public: object_stream() : m_helper(this) { } virtual ~object_stream() { } operator streamer () { // Casted on return from method object_end(); return m_helper; } // All fundamental types go here virtual object_stream operator (int x) = 0; // This catches all non-fundamental types and safely preserves // our type information while calling template class T object_stream operator (const T x) { // Don't use conversion routine to cast us (not end of object) return *(m_helper x).m_impl; } friend class streamer; }; template class T object_stream streamer::operator (const T x) { m_impl-object_begin(); return *m_impl x; } // Concrete streamer class paran_object_stream : public object_stream { protected: void object_begin() { cout [ ; } void object_end () { cout ] ; } public: paran_object_stream operator (int x) { cout x ; return *this; } }; class paran_streamer : public streamer { protected: paran_object_stream m_obj; public: paran_streamer() : streamer(m_obj) // a bit bad since it is not init'd, but since { } // we won't do anything in the base-class, ok }; // Generic user struct Foo { int x; }; streamer operator (streamer o, const Foo f) { return o f.x; } struct Bar { int a; Foo b; int c; int d; Foo e; }; streamer operator (streamer o, const Bar b) { return o b.a b.b b.c b.d b.e; } // test int main() { Bar b; b.a = 1; b.b.x = 2; b.c = 3; b.d = 4; b.e.x = 5; paran_streamer s; s b; cout endl; } ___ Unsubscribe other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] Re: Re: STL applied to disk
On Tue, Nov 19, 2002 at 12:52:19PM -0500, David Abrahams wrote: Wesley W. Terpstra [EMAIL PROTECTED] writes: On Tue, Nov 19, 2002 at 10:38:27AM -0500, David Abrahams wrote: I haven't been paying attention, but IIUC what you're proposing, these things are no longer conforming iterators. The way to make random access iterators over disk storage is to build an iterator which stores its value_type internally. You can even arrange for it to construct the value_type in its internal storage on demand, so that it doesn't store anything until it is dereferenced. I assume you mean they are not iterators because operator - is broken? And operator*. From http://www.sgi.com/tech/stl/trivial.html: [1] The requirement for the return type of *x is specified as convertible to T, rather than simply T, because it sometimes makes sense for an iterator to return some sort of proxy object instead of the object that the iterator conceptually points to. Proxy objects are implementation details rather than part of an interface (one use of them, for example, is to allow an iterator to behave differently depending on whether its value is being read or written), so the value type of an iterator that returns a proxy is still T. Similar things can be found for vect[offset]. I am printing http://www.boost.org/libs/utility/iterator_adaptors.pdf to take home with me this evening to see what is in there. What you are proposing however is flawed for several reasons. If I stored the value_type internally, this will break: map::iterator i = ...; map::reference x = *i; ++i; x = ...; // what is x now pointing at? the wrong record. That code is already broken if it makes any assumptions about what x refers to after ++i. Sad but true. Really? Are you certain about this? If you could give me a quote I would love to hear it. I know it is not going to work for Input iterators, but what about a Forward Iterator? What about map::pointer p = i-fn_returning_this(); ++i; is p now invalid? I know that in the STL containers it is generally still ok. (map, set, list, etc) but that doesn't mean it is allowed. :-) Also, if you have two iterators pointing at the same thing, but keeping distinct value_types internally, expressions like: i-set_member_a(j-set_member_b(3) + 2); will break -- only one of the changes will make it to disk. You can get around this by dynamically allocating the value_type and keeping a cache of active values in the container... if it's important. Errr... That is what I said after all in the part you just snipped. And, aren't you are the one who said So what? to partial conformance? If the above expression fails to work, it is far worse than not providing operator -. The missing operator is detected at compile time; this could take a long time to track down. Actually, even: i-set_bar(2); j-set_bar(4); with i==j could write 2 to the disk with your internal value_type. Clearly not what the user intended and very hard to detect reading it. Therefore, you MUST have a common allocated value_type, which means you are going to need some way to find them. The information you have at the time you want to find them is: the unique memory location of the serialized item the unique sector+offset of the serialized item the serialized item a pointer to the session object, transaction, and database I seek an efficient way to do this. So far, the sector+offset map is the best I can think of. The whole question revolves around: is the overhead of such a table justified by the benefit of allowing member methods to be called on objects within the container. It depends on whether you're advertising STL compatibility or not. If not, do what ever you like and use a large, loud disclaimer when you write iterator (in quotes) in your documentation. If so, you have to bite the bullet and make the iterators conform. I really do want them to conform. Don't read otherwise into my writing. However, in the practical situation this comes from, speed and consistency of the data are probably more important than feature coverage. There are significant costs: the overhead of redundant cache (it is already cached at the sector level) the overhead of indexing the map (considerable if you are just deserializing an int) My current answer is not justified. But, I am open to persuasion, especially in the form of an optimized solution. I think it's early to worry about optimization. Make it work first. An implementation which lies about its iterators is broken. Man, that is a bit hard-line. If it fails to compile, that is far better than obscure crashes (or worse---data corruption) later. The whole reason I started this thread was to find a way to implement i-foo_bar(); efficiently. I know
[boost] STL applied to disk
Good afternoon! I am looking at making an stl-compatible wrapper around a key-value database. It seems to me that such a wrapper would be widely useful since: 1. stdc++ algorithms could operate on the databases 2. switching a map... that grew too large to disk-backed becomes trivial 3. old reliable stl code could be reused on disk 4. a very gentle learning curve to existing C++ developers 5. it would be highly convenient to use 6. quite likely clever (ab)uses which I do not foresee would be possible Obviously, any scheme like this would require serialisation of the key/data pairs. My solution thus far has been to include a SerialTraitsT concept which provides a conversion method. Then the databases look like: MapDatabaseKeyTraits, DataTraits db; where the KeyTraits include the typename of the Key, and the serialisation methods. My comparison is always the lexical comparison on the serialised object. Things have been going surprisingly well, but I have a problem that comes from the serialisation: References and members. map[key].non_const_member_fn(); (*i).any_member_fn(); The stl (rightly) assumes all the objects are in their usual representation in RAM. Therefore, you can call const methods on them and if they are mutable you can call non-const methods. This is a disaster. Although one might niavely claim mmap() could keep the representation of RAM on the disk, I would disagree since this would still impose arcane restrictions on the class member variables. I have considered several solutions none of which I consider fully adequate. Solution #1: don't do that! To fix (*i) one could use that the specification merely says that *i be convertible to T and assignable. Therefore I could return a proxy object which serialised on assignment and deserialised on conversion. Unfortunately, many legacy programs do (*i).fn() since i-fn() was unreliable in compilers. This will not work with a proxy object. Further, i-fn() is impossible. Solution #2: cache it! I have also considered deserialising an object once, allowing modifications, etc. Then on commit reserialising. This would work out ok, except that it introduces baggage: I can't hold on to all the records that have been read since the user might be touching more disk than RAM. Therefore, I would have to do some sort of reference counting in my iterators. This would unfortunately break any code which took a T or T* from an iterator and held on to it. I am duplicating the read cache of the database in a wrapper. (on the plus side I am also saving deserialisation work) Solution #3: fuzzy template+inheritance tricks! I figure there might be some clever way to return an object which looks like the data object, but really is not. Maybe by inheriting a template class from the contained class. Returning these might be able to do what is needed; eg: on destruction, write back to the database library. This seems like a good idea, but it is fraught with complications. Consider two iterators ij which (happen by chance to) point to the same object. i-set_a(j-set_b(4) + 2); Oops. You would expect both changes to work since they are presumably modifying different member variables. However, since i- and j- both read from disk and deserialised to a temporary, we are modifying two different temporaries. Therefore only one of the changes (whoever's object destroys last) will be made. I have a good feeling about this solution though as I think it conceivable that smart enough template code might be able to detect these cases. Solution #4: ask someone smarter! ... that's you. :-) ___ Unsubscribe other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Re: [boost] Re: STL applied to disk
First off, let me appologize; I should have given way more context in my original post. boost members are not psychic. :-) On Tue, Nov 12, 2002 at 05:31:46PM +0200, Bohdan wrote: Wesley W. Terpstra [EMAIL PROTECTED] wrote in message news:20021112113226.GA466;ito.tu-darmstadt.de... Good afternoon! I am looking at making an stl-compatible wrapper around a key-value database. It seems to me that such a wrapper would be widely useful since: 1. stdc++ algorithms could operate on the databases 2. switching a map... that grew too large to disk-backed becomes trivial 3. old reliable stl code could be reused on disk 4. a very gentle learning curve to existing C++ developers 5. it would be highly convenient to use 6. quite likely clever (ab)uses which I do not foresee would be possible I am answering your message out of sequence because I think it will be more clear. The other idea is that transaction object is needed here. Yes, I already have that solved: You get a transaction object from the environment. You get a database object from the environment. You combine the two to get a session object. The session looks like an stl map (it issues iterators). When you commit the transaction all the sessions using that transaction are invalidated as are all the iterators that they issued. ... the problem I have is the calling of member methods through the iterator. I know I could do it with a several approaches, I am trying to find the most elegant with minimal overhead. ObjectDatabases are very painful things. Unfortunately they are not too popular nowadays. The reason for this is simple, they are extrimally difficult to implement and use (at least for c++). Yes they are. Fortunately, I am not trying to write one. I have a *very* narrow functionality target. When you put it in the container, then it is on disk and in my control. When you take it out, it is not. Obviously, any scheme like this would require serialisation of the key/data pairs. My solution thus far has been to include a SerialTraitsT concept which provides a conversion method. Then the databases look like: MapDatabaseKeyTraits, DataTraits db; You can use new boost::serialization library. I have considered this and rejected it (the coupling, not boost::serialization). I would rather leave this entirely in user control. It would be a matter of adding a single template class if they wished to bridge the two themselves, so there is no functionality lost. Further, it is important to control the serialized format so that the lexical sort order has desirable properties. This fine grained control is not available under boost::serialization without writing a stream object for each desired lexical-sort seriailization. Also, this way I depend on less things. I have doubts if you can use std::map interface for your database class. I agree that it is not designed for this purpose, but I think there are many beneficial emergent properties. It does not actually have to exactly conform; just conform with the subset used in existing practice. I know from experience with a previous product that even something that closely approximates a std::map is highly useful. I just want to bring that approximation a bit closer so that code really can ignore the difference. Although one might niavely claim mmap() could keep the representation of RAM on the disk, I would disagree since this would still impose arcane restrictions on the class member variables. Well, there are two ways: 1. You need disk to reduce memory usage. 2. You need disk to persist objects. I'm not sure which one is yours. Did you ? Did I ? I desire both of the two properties, you want me to choose? :-) Is your comment here pertaining to mmap()? This would unfortunately break any code which took a T or T* from an iterator and held on to it. If you want to allow pointers and use them after application restart than use smart pointers: I've heard something about some system/processor tricks which allow to persist pointers, but i do not think it is good way. You are considering serialization. I have no interest in this topic. My implementation presumes that you have picked one of the many available serialization tools or rolled your own. This is especially not-so-important because I do not plan on supporting storing anything other than by-value. Further, since it looks like the stl I know all the type information and there is no polymorphism. The T and T* I was refering to are those obtained from the map::iterator class that is walking records. The user might dereference this iterator and take a reference to the object. Rather than telling them to use a smart pointer, better would be to say: just keep the iterator! I am just concerned with breaking existing stl code. If possible, I would rather this could work, but I don't see how. I am duplicating the read cache of the database in a wrapper