Hi, I came across this and found it interesting, perhaps you will too. https://www.arangodb.org/2012/07/11/infographic-comparing-space-usage-mongodb-couchdb-arangodb
Regards, -Stefán On Tuesday, 18 March 2014 06:58:08 UTC, Steve Coughlan wrote: > > Sure... I'll clean it up a bit over the next few days and document a bit > more clearly. Then I'll push it. No sense trying to figure out the whole > API by myself when there's people who know it inside out. Good thinking ;) > > On 18/03/14 15:28, Luca Garulli wrote: > > Hi Steve, > we're still closing 1.7, so we have a few months for 2.0. What I'd like is > a new branch where everybody can look, contribute and compare approaches. > > Could you push it on a branch in GitHub? WDYT? > > Lvc@ > > > > On 18 March 2014 06:21, Steve <[email protected] <javascript:>> wrote: > > Hi Luca, > > Apologies. I was able to make a PoC ok but I ran into lots of > difficulties trying to integrate it into orient code (mainly to do with > working out how not break binary compaitibility with the existing binary > protocol). Then work went insanely busy for a while and I haven't got back > to it yet. What sort of timelines are looking at for v2.0? I know you are > keen to get something like this into that version and obviously if you use > my work your devs will need plenty of time to review and tweak it. If you > can tell me how long you have then I can give you an idea whether I think I > can realistically deliver or not. I would like to do this but I don't want > to leave you waiting for me if you have resources that could do it sooner. > > regards, > > Steve > > > > On 18/03/14 15:09, Luca Garulli wrote: > > Hi Steve, > have you had the chance to play with this? Any updates? > > Lvc@ > > > > On 21 February 2014 19:01, Luca Garulli <[email protected] > <javascript:>>wrote: > > On 20 February 2014 13:24, Steve <[email protected] <javascript:>>wrote: > > Hi Andrey, > > I forked orient-core today and spent most of the day playing around with > the source trying to work out how to change over my pseudo schema, > property, type classes into OSchema, OProperty, OType. > ORecordSerializerDocument2Binary was very useful for understanding things. > Is it actually in use? I can't find any references to it. > > > AFAIK it's not used. It was just a prototype. > > > Could you explain *"We have many third party drivers for binary protocol"*a > bit more? Are there any examples? > > > All the binary drivers manage directly the current serialization. The > content is sent in binary for to the client and it has to unmarshall. To > all the binary drivers implemented it. > > At the beginning we could marshall the content in old form when we send > it to the clients, based on client protocol version. > > > I also have a question about ORID and whether it can be considered fixed > length. It contains OClusterPosition which has two implementations. One > is 8 bytes long and the other is 24 bytes long. For the purposes of > serialization we can't consider the ORID to be fixed length unless we > guaruntee that every instance of ORID within a DB is only one of these > implementations. Is this the case? > > > Consider it as fixed length, the longer is not yet used. > > > At the moment I'm also wrestling with what to do about null fixed length > fields and whether to reserves space inside a record. Whilst headers are > ordered by schema_fixed_length, schema_variable_length, schema_less fields > there's no reason data needs to follow the same order. But by default it > probably would. Consider an object schema like this: > class SomeClass { > update_time: DateTime //fixed length > short_string: String > massive_string: String > } > > If we first write the record and update_time is null we'd have something > like this > update_time:0 bytes|short_string: 10 bytes|massive_string:100kbytes > > Then we update it to add update_time we have a few options. > 1/ When originally writing the object reserve space even though the value > is null (wasted space) > 2/ Search for a hole. e.g. if short_string has been set to null we could > steal it's space. > 3/ Write the update_time field after massive_string (If there is space > before the beginning of the next record). Potentially we are writing into a > different disk block so for future reads when we aren't interested in > massive_string we still have to load the block into memory) > 4/ Rewrite the entire record. > > I suppose it is worth considering whether there's a benefit to reserving > partial holes. i.e. if we have 10 * 4 byte nullable fixed length fields > (all null on initial write) should we take a guess and reserve say 10 out > of the 40 possible bytes for future updates? But I'm probably getting > ahead of myself. I'll work on a simple implementation first before trying > to be too clever ;) > > > Good question. > > I think that reserving space for fixed length fields has the advantage > to keep the fixed size area as is and fixed length fields are usually > small, maximum 8 bytes each. > > By the way, datetime now are stored as long, so probably a -1 could > means NULL. We should figure out how to represent NULL on each type. > > Lvc@ > > > > On 20/02/14 20:12, Andrey Lomakin wrote: > > Hi Steve, > Good that you are going to help us. > Few additional information: > 1. We already have binary serialization support you can see it here > com.orientechnologies.common.serialization.types.OBinarySerializer so > obviously we should not have several version of the same. Also I think it > will be interesting for you to look at this issue and discussion here > https://github.com/orientechnologies/orientdb/issues/681#issuecomment-28466948. > We discussed serialization of single record (sorry had no time to analyze > it deeply because a lot of events) but in case of SQL query you have to > process millions of them. > 2. We are working on binary compatibility mechanics too (I mean > compatibility between storage formats), without it current users will not > be able to accomplish new features especially binary serialization. > 3. We have many third party drivers for binary protocol (which pass > serialized records on client;s side) so we have to think how to not break > functionality of this drivers. > > > > On Wed, Feb 19, 2014 at 1:53 PM, Steve <[email protected] > <javascript:>>wrote: > > Hi Luca, > > I'll give it a go with the real ODB code. The reason I didn't is because > I'm actually quite new to ODB even as an end user but your instructions > will set me in the right direction. Most of my experience with data > serialization formats has been with Bitcoin which was mostly for network > protocol use cases rather than big-data storage. But that was also a high > performance scenario so I guess there are a lot of parallels. > > > On 19/02/14 21:33, Luca Garulli wrote: > > Hi Steve, > sorry for such delay. > > I like your ideas, I think this is the right direction. varint8 e > varint16 could be a good way to save space, but we should consider when > this slows down some use cases, like partial field loading. > > About the POC you created I think it would be much more useful if you > play with real documents. It's easy and you could push it to a separate > branch to let to us and other developers to contribute & test. WDYT? > > Follow these steps: > > (1) create your serializer > > This is the skeleton of the class to implement: > > public class BinaryDocumentSerializer implements ORecordSerializer { > public static final String NAME = "binarydoc"; > > // UN-MARSHALLING > public ORecordInternal<?> fromStream(final byte[] iSource) { > } > > // PARTIAL UN-MARSHALLING > public ORecordInternal<?> fromStream(final byte[] iSource, final > ORecordInternal<?> iRecord, String[] iFields) { > } > > // MARSHALLING > public byte[] toStream(final ORecordInternal<?> iSource, boolean > iOnlyDelta) { > } > } > > (2) register your implementation > > ORecordSerializerFactory.instance().register(BinaryDocumentSerializer.NAME, > new BinaryDocumentSerializer()); > > (3) create a new ODocument subclass > > Then create a new class that extends ODocument but uses your > implementation: > > public class BinaryDocument extends ODocument { > protected void setup() { > super.setup(); > _recordFormat = > ORecordSerializerFactory.instance().getFormat(BinaryDocumentSerializer.NAME); > } > } > > (4) Try it! > > And now try to create a BinaryDocument, set fields and call .save(). The > method BinaryDocumentSerializer.toStream() will be called. > > ... -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
