Re: [boost] Re: Sockets - what's the latest?

2003-03-03 Thread Wesley W. Terpstra
On Wed, Feb 12, 2003 at 06:11:59PM -0500, Jason House wrote:
 Once I heard there was a generic socket library in development, I thought I'd add
 a quick feature request.   I would like to see the ability to have multiple
 streams through the same socket.
 
 This boils down to providing two distinct benefits.
 1:  Programs can easily perform complex communications over a single port.
 2: Without multiple streams, problems can occur when there are multiple clients
 behind a proxy connecting to a host outside of the proxy.  If the client only
 forms a single connection to the host, there won't be a problem because the random
 source port will differentiate each stream.   So, when multiple clients connect to
 a host from behind a proxy, the host can only differentiate each stream by the
 random source port.  So, when the clients form a second connection to the host,
 each stream gets differentiated from each other, but there is no mapping of random
 source port ot the distinct client.

What you are asking for here is:
http://chorus.sourceforge.net/
... in my slightly biased opinion. ;-)

This gives you a stackable transport framework which includes (among other
things) a multiplexed channel.

This project is still really new, but is shaping up nicely.

However, it has no relation to boost and is primarily targetted at building
distributed hash tables (p2p apps) that can go through any data channel.

---
Wes
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Re: [boost] Serialization XML (was Serialization Library

2002-11-22 Thread Wesley W. Terpstra
Since no-one seemed to notice my prior post which I think addressed some of
these issues; I am reattaching it here.

On Thu, Nov 21, 2002 at 07:45:55AM -0800, Robert Ramey wrote:
 My question is whether XML can capture an arbitrary C++ structure in a
 meaningful and useful way.  So far no one has presented any XML that
 captures that one proposed example.

I did.

 Well, I don't know that.  In general it is extremely difficult to know ahead of
 time what facilities a serialization library would need to be permit an XML
 archive to be generated.  One would have to take a the library, make
 changes necessary to provide the desired result and check to see
 what changes are necessary.

You will not need any hooks; to fully bracket the data, you can use a
type-conversion trick made concrete below.

 * Some approaches, including XML, allow a practically unlimited number of 
 different ways to represent the same data. The user rather than the 
 serialization library should choose the particular design.

XSLT will allow this. As long as the serialization library can output to
SOME form of useful XML (such as the hierarchical format I propose), the
mapping between any particular schema and this format can be done as a
relatively straight-forward stylesheet.

 In the current system the following concepts are orthogonal
 
 a) The description of the which data should be saved for each class 
(save/load/version)
 b) composition of the above to handle arbitrary C++ data structures (serialization)
 c) description of how fundamental types should be encoded as a byte stream
 into a storage medium (archive)
 
 Assuming that the questions in my Thought experiment could be answered
 in the afirmative.  What would have to be added to this system to permit
 it to handle XML.
 
 Another concept has to be added - that of reflection.  A useful XML
 representation needs the name of the variable.  So some system needs to be
 designed to hold that information and keep it related to each serializable
 member.  Presumably this would be a orthogonal concept d)

Yes; I had proposed in an earlier email a seperate serializor which included
the name strings:
return o  bar  bar  foo  foo ...

This would provide the needed names to the system.
The trick below provides the required hierachical information.
The XSLT provides user-customizable formats.
Your existing system for bases classes can do the diamond work.

Alternately, the normal streamer could be adapated to take these names by
default, and ignore them simpler data streams.

A clever use of macros might also make this automatic.

 Given this, without too much effort and maybe adding some virtual
 functions to archive one could add begin/end tags to archive.  Of course
 many would object to this on efficiency grounds but it would be possible. 
 But things start to appear. What about versioning? where does that fit
 into XML? But what about pointers, inheritance, etc.  to properly capture
 this in XML one would have to start altering b) .  Its the automatic
 composition that guarentees that this system can serialize/deserialize any
 C++ structure.  I doubt this would be worth it.

I think that these are non-issues. Write them into the hierarchy in whatever
way is most convenient and still restorable. The user needs to decide how he
wants to represent these things himself in the style-sheet.

When deserializing, another style-sheet should fill in whatever extras the
user dropped from the output XML; such as version numbers, etc.

 Of course, anyone is free to the the current serialization system and
 experiment to see what it would really take to accomodate XML.  (After
 all, its should be easy if I'm wrong).  But won't be me.

Here is a start: (but also the end of my contribution; I just thought this
type conversion was a neat trick that someone might want to use)

---

// Example begins
// Compiles and works with g++-2.95.4

#include iostream
using namespace std;

// Common Framework

class object_stream;

class streamer
{
 protected:
object_stream*  m_impl;
streamer(object_stream* stream) : m_impl(stream) { }
 
 public:
template class T
object_stream operator  (const T x);
 
 friend class object_stream;
};

class object_stream
{
 protected:
streamer m_helper;
 
virtual void object_begin() = 0;
virtual void object_end  () = 0;
 
 public:
object_stream() : m_helper(this) { }
virtual ~object_stream() { }
 
operator streamer ()
{   // Casted on return from method
object_end();
return m_helper;
}
 
// All fundamental types go here
virtual object_stream operator  (int x) = 0;
 
// This catches all non-fundamental types and safely preserves
// our type information while calling 
template class T

[boost] Bracketing a stream (was: Serialization to relational table)

2002-11-19 Thread Wesley W. Terpstra
On Tue, Nov 19, 2002 at 06:07:23PM +0100, Wesley W. Terpstra wrote:
 The trick is to use the FUNCTION boundary of the serializor.

snip code

I have attached a working proto-type.

This is merely proof of concept; I am not sure whether one should bracket
fundamental types for instance.

The output is presently:
[ 1 [ 2 ] 3 4 [ 5 ] ] 
but maybe should be:
[ [1] [ [2] ] [3] [4] [ [5] ] ]

What do people think?
I am certain someone smarter than I could make this even more clever.

---
Wes

// Example begins
// Compiles and works with g++-2.95.4

#include iostream
using namespace std;

// Common Framework

class object_stream;

class streamer
{
 protected:
object_stream*  m_impl;
streamer(object_stream* stream) : m_impl(stream) { }

 public:
template class T
object_stream operator  (const T x);

 friend class object_stream;
};

class object_stream
{
 protected:
streamer m_helper;

virtual void object_begin() = 0;
virtual void object_end  () = 0;

 public:
object_stream() : m_helper(this) { }
virtual ~object_stream() { }

operator streamer ()
{   // Casted on return from method
object_end();
return m_helper;
}

// All fundamental types go here
virtual object_stream operator  (int x) = 0;

// This catches all non-fundamental types and safely preserves
// our type information while calling 
template class T
object_stream operator  (const T x)
{   // Don't use conversion routine to cast us (not end of object)
return *(m_helper  x).m_impl;
}
 
 friend class streamer;
};

template class T
object_stream streamer::operator  (const T x)
{
m_impl-object_begin();
return *m_impl  x;
}

// Concrete streamer

class paran_object_stream : public object_stream
{
 protected:
void object_begin() { cout  [ ; }
void object_end  () { cout  ] ;  }
 
 public:
paran_object_stream operator  (int x)
{ cout  x   ; return *this; }
};

class paran_streamer : public streamer
{
 protected:
paran_object_stream m_obj;

 public:
paran_streamer()
 : streamer(m_obj) // a bit bad since it is not init'd, but since
{ } // we won't do anything in the base-class, ok
};

// Generic user

struct Foo
{
int x;
};

streamer operator  (streamer o, const Foo f)
{ return o  f.x; }

struct Bar
{
int a;
Foo b;
int c;
int d;
Foo e;
};

streamer operator  (streamer o, const Bar b)
{ return o  b.a  b.b  b.c  b.d  b.e; }

// test

int main()
{
Bar b;
b.a = 1;
b.b.x = 2;
b.c = 3;
b.d = 4;
b.e.x = 5;

paran_streamer s;

s  b;
cout  endl;
}
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



Re: [boost] Re: Re: STL applied to disk

2002-11-19 Thread Wesley W. Terpstra
On Tue, Nov 19, 2002 at 12:52:19PM -0500, David Abrahams wrote:
 Wesley W. Terpstra [EMAIL PROTECTED] writes:
  On Tue, Nov 19, 2002 at 10:38:27AM -0500, David Abrahams wrote:
  I haven't been paying attention, but IIUC what you're proposing, these
  things are no longer conforming iterators.
  
  The way to make random access iterators over disk storage is to build
  an iterator which stores its value_type internally. You can even
  arrange for it to construct the value_type in its internal storage on
  demand, so that it doesn't store anything until it is dereferenced.
 
  I assume you mean they are not iterators because operator - is
  broken?
 
 And operator*.

From http://www.sgi.com/tech/stl/trivial.html:

[1] The requirement for the return type of *x is specified as
convertible to T, rather than simply T, because it sometimes makes
sense for an iterator to return some sort of proxy object instead of
the object that the iterator conceptually points to. Proxy objects
are implementation details rather than part of an interface (one use
of them, for example, is to allow an iterator to behave differently
depending on whether its value is being read or written), so the
value type of an iterator that returns a proxy is still T.

Similar things can be found for vect[offset].

I am printing http://www.boost.org/libs/utility/iterator_adaptors.pdf
to take home with me this evening to see what is in there.

  What you are proposing however is flawed for several reasons.
 
  If I stored the value_type internally, this will break:
 
  map::iterator i = ...;
  map::reference x = *i;
  ++i;
  x = ...; // what is x now pointing at? the wrong record.
 
 That code is already broken if it makes any assumptions about what x
 refers to after ++i. Sad but true.

Really? Are you certain about this? If you could give me a quote I would
love to hear it. I know it is not going to work for Input iterators, but
what about a Forward Iterator?

What about
map::pointer p = i-fn_returning_this();
++i;

is p now invalid? I know that in the STL containers it is generally still
ok. (map, set, list, etc) but that doesn't mean it is allowed. :-)

  Also, if you have two iterators pointing at the same thing, but keeping
  distinct value_types internally, expressions like:
  i-set_member_a(j-set_member_b(3) + 2);
  will break -- only one of the changes will make it to disk.
 
 You can get around this by dynamically allocating the value_type and
 keeping a cache of active values in the container... if it's
 important.

Errr... That is what I said after all in the part you just snipped. 

And, aren't you are the one who said So what? to partial conformance?

If the above expression fails to work, it is far worse than not providing
operator -. The missing operator is detected at compile time; this could
take a long time to track down.

Actually, even:
i-set_bar(2);
j-set_bar(4);
with i==j could write 2 to the disk with your internal value_type.
Clearly not what the user intended and very hard to detect reading it.

Therefore, you MUST have a common allocated value_type, which means you
are going to need some way to find them. The information you have at the
time you want to find them is:

the unique memory location of the serialized item
the unique sector+offset of the serialized item
the serialized item
a pointer to the session object, transaction, and database

I seek an efficient way to do this.
So far, the sector+offset map is the best I can think of.

  The whole question revolves around:
 
  is the overhead of such a table justified by the benefit of
  allowing member methods to be called on objects within the
  container.
 
 It depends on whether you're advertising STL compatibility or not. If
 not, do what ever you like and use a large, loud disclaimer when you
 write iterator (in quotes) in your documentation. If so, you have to
 bite the bullet and make the iterators conform.

I really do want them to conform. Don't read otherwise into my writing.
However, in the practical situation this comes from, speed and consistency
of the data are probably more important than feature coverage.

  There are significant costs:
  the overhead of redundant cache
  (it is already cached at the sector level)
  the overhead of indexing the map
  (considerable if you are just deserializing an int)
 
  My current answer is not justified. But, I am open to persuasion,
  especially in the form of an optimized solution.
 
 I think it's early to worry about optimization. Make it work first. An
 implementation which lies about its iterators is broken.

Man, that is a bit hard-line. If it fails to compile, that is far better
than obscure crashes (or worse---data corruption) later.

The whole reason I started this thread was to find a way to implement
i-foo_bar(); efficiently. I know

[boost] STL applied to disk

2002-11-12 Thread Wesley W. Terpstra
Good afternoon!

I am looking at making an stl-compatible wrapper around a key-value database.

It seems to me that such a wrapper would be widely useful since:
  1. stdc++ algorithms could operate on the databases
  2. switching a map... that grew too large to disk-backed becomes trivial
  3. old reliable stl code could be reused on disk
  4. a very gentle learning curve to existing C++ developers
  5. it would be highly convenient to use
  6. quite likely clever (ab)uses which I do not foresee would be possible

Obviously, any scheme like this would require serialisation of the key/data
pairs. My solution thus far has been to include a SerialTraitsT concept
which provides a conversion method. Then the databases look like:
MapDatabaseKeyTraits, DataTraits db;
where the KeyTraits include the typename of the Key, and the serialisation
methods. My comparison is always the lexical comparison on the serialised
object.

Things have been going surprisingly well, but I have a problem that comes
from the serialisation: References and members.

map[key].non_const_member_fn();
(*i).any_member_fn();

The stl (rightly) assumes all the objects are in their usual representation
in RAM. Therefore, you can call const methods on them and if they are
mutable you can call non-const methods.

This is a disaster. 

Although one might niavely claim mmap() could keep the representation of RAM
on the disk, I would disagree since this would still impose arcane
restrictions on the class member variables.

I have considered several solutions none of which I consider fully adequate.

Solution #1: don't do that!

To fix (*i) one could use that the specification merely says that *i
be convertible to T and assignable. Therefore I could return a proxy
object which serialised on assignment and deserialised on conversion.

Unfortunately, many legacy programs do (*i).fn() since i-fn() was
unreliable in compilers. This will not work with a proxy object.

Further, i-fn() is impossible.

Solution #2: cache it!

I have also considered deserialising an object once, allowing
modifications, etc. Then on commit reserialising.

This would work out ok, except that it introduces baggage:

I can't hold on to all the records that have been read since the
user might be touching more disk than RAM. Therefore, I would have
to do some sort of reference counting in my iterators.

This would unfortunately break any code which took a T or T*
from an iterator and held on to it.

I am duplicating the read cache of the database in a wrapper.
(on the plus side I am also saving deserialisation work)

Solution #3: fuzzy template+inheritance tricks!

I figure there might be some clever way to return an object which
looks like the data object, but really is not. Maybe by inheriting a
template class from the contained class. Returning these might be
able to do what is needed; eg: on destruction, write back to the
database library.

This seems like a good idea, but it is fraught with complications.
Consider two iterators ij which (happen by chance to) point to the
same object.
i-set_a(j-set_b(4) + 2);

Oops. You would expect both changes to work since they are
presumably modifying different member variables. However, since
i- and j- both read from disk and deserialised to a temporary,
we are modifying two different temporaries. Therefore only one of
the changes (whoever's object destroys last) will be made.

I have a good feeling about this solution though as I think it
conceivable that smart enough template code might be able to detect
these cases.

Solution #4: ask someone smarter!

... that's you. :-)
___
Unsubscribe  other changes: http://lists.boost.org/mailman/listinfo.cgi/boost



Re: [boost] Re: STL applied to disk

2002-11-12 Thread Wesley W. Terpstra
First off, let me appologize; I should have given way more context in my
original post. boost members are not psychic. :-)

On Tue, Nov 12, 2002 at 05:31:46PM +0200, Bohdan wrote:
 
 Wesley W. Terpstra [EMAIL PROTECTED] wrote in message
 news:20021112113226.GA466;ito.tu-darmstadt.de...
  Good afternoon!
 
  I am looking at making an stl-compatible wrapper around a key-value database.
 
  It seems to me that such a wrapper would be widely useful since:
1. stdc++ algorithms could operate on the databases
2. switching a map... that grew too large to disk-backed becomes trivial
3. old reliable stl code could be reused on disk
4. a very gentle learning curve to existing C++ developers
5. it would be highly convenient to use
6. quite likely clever (ab)uses which I do not foresee would be possible

I am answering your message out of sequence because I think it will be more
clear.

 The other idea is that transaction object is needed here.

Yes, I already have that solved:

You get a transaction object from the environment.
You get a database object from the environment.
You combine the two to get a session object.
The session looks like an stl map (it issues iterators).
When you commit the transaction all the sessions using that transaction are
invalidated as are all the iterators that they issued.

... the problem I have is the calling of member methods through the
iterator. I know I could do it with a several approaches, I am trying 
to find the most elegant with minimal overhead.

 ObjectDatabases are very painful things. Unfortunately
 they are not too popular nowadays. The reason for
 this is simple, they are extrimally difficult to implement
 and use (at least for c++).

Yes they are. Fortunately, I am not trying to write one.
I have a *very* narrow functionality target.

When you put it in the container, then it is on disk and in my control.
When you take it out, it is not.

  Obviously, any scheme like this would require serialisation of 
  the key/data pairs. My solution thus far has been to include a
  SerialTraitsT concept which provides a conversion method. Then 
  the databases look like: MapDatabaseKeyTraits, DataTraits db;
 
 You can use new boost::serialization library.

I have considered this and rejected it (the coupling, not boost::serialization).

I would rather leave this entirely in user control. It would be a matter of
adding a single template class if they wished to bridge the two themselves,
so there is no functionality lost.

Further, it is important to control the serialized format so that the
lexical sort order has desirable properties. This fine grained control is
not available under boost::serialization without writing a stream object for
each desired lexical-sort seriailization.

Also, this way I depend on less things.

 I have doubts if you can use std::map interface for your database class.

I agree that it is not designed for this purpose, but I think there are
many beneficial emergent properties. It does not actually have to exactly
conform; just conform with the subset used in existing practice.

I know from experience with a previous product that even something that
closely approximates a std::map is highly useful. I just want to bring that
approximation a bit closer so that code really can ignore the difference.

  Although one might niavely claim mmap() could keep the representation of
  RAM on the disk, I would disagree since this would still impose arcane
  restrictions on the class member variables.
 
 Well, there are two ways:
1. You need disk to reduce memory usage.
2. You need disk to persist objects.
 I'm not sure which one is yours. Did you ?

Did I ?

I desire both of the two properties, you want me to choose? :-)
Is your comment here pertaining to mmap()?

  This would unfortunately break any code which took a T or T*
  from an iterator and held on to it.
 
 If you want to allow pointers and use them after application restart
 than use smart pointers:
 
 I've heard something about some system/processor tricks which allow
 to persist pointers, but i do not think it is good way.

You are considering serialization. I have no interest in this topic.
My implementation presumes that you have picked one of the many available
serialization tools or rolled your own.

This is especially not-so-important because I do not plan on supporting
storing anything other than by-value. Further, since it looks like the stl 
I know all the type information and there is no polymorphism.

The T and T* I was refering to are those obtained from the map::iterator
class that is walking records. The user might dereference this iterator and
take a reference to the object. Rather than telling them to use a smart
pointer, better would be to say: just keep the iterator! 

I am just concerned with breaking existing stl code.
If possible, I would rather this could work, but I don't see how.

  I am duplicating the read cache of the database in a wrapper