Thank you Steve! I look forward to use this or pitch in at later stages.
Regards, -Stefan On Sunday, 6 April 2014 10:22:17 UTC, Steve Coughlan wrote: > > I've spent the last few days playing with this and I've just pushed the > results so far to > https://github.com/shadders/orientdb/tree/binary-serialization/binary > > It needs a lot of work to get it integrated into ODB but it's start and I > wanted to get it up somewhere where the developers can look at it so I can > start asking the question I need to ask to get it to play with Orient-core. > > Currently I haven't touched orient-core's code so it's all in a seperate > project under the 'binary' directory. I have tried to align the classes > with Orient's class structure though so I can gradually integrate it. > > Tomorrow I will do a proper writeup of where it's at, how I've specified > the format, what questions I need to ask and what barriers I've come across > with the orient internal API. For now the OBinarySerializer will only work > serializing a document back and forth to an array. db.save(document) > throws up a few problems which I need to ask questions about. It also only > handles primitive type OTypes but that it not really a big deal as the > format is quite agnostic to how an individual field is serialized so it's > possibly just a matter of adapting existing serializers or building new > ones for a few OTypes (which doesn't look too hard). > > I will start with one question though. My OBinaryDocument class inherits > from ODocument and most constructors match and call super(sameParams) but > for some reason when I save the document doesn't generate an ORID. Problem > is ODocument.clusterIds is null, but I can't find how they are set. Any > hints? > > For now the record format is documented reasonably well in the class > javadoc for ORecordHeader. > > > On 22/03/14 09:00, [email protected] <javascript:> wrote: > > > More here: > https://www.arangodb.org/2012/07/08/collection-disk-usage-arangodb > > On Friday, 21 March 2014 21:55:28 UTC, [email protected] wrote: > > Hi, > > I came across this and found it interesting, perhaps you will too. > > https://www.arangodb.org/2012/07/11/infographic-comparing-space-usage-mongodb-couchdb-arangodb > > Regards, > -Stefán > > On Tuesday, 18 March 2014 06:58:08 UTC, Steve Coughlan wrote: > > Sure... I'll clean it up a bit over the next few days and document a bit > more clearly. Then I'll push it. No sense trying to figure out the whole > API by myself when there's people who know it inside out. Good thinking ;) > > On 18/03/14 15:28, Luca Garulli wrote: > > Hi Steve, > we're still closing 1.7, so we have a few months for 2.0. What I'd like is > a new branch where everybody can look, contribute and compare approaches. > > Could you push it on a branch in GitHub? WDYT? > > Lvc@ > > > > On 18 March 2014 06:21, Steve <[email protected]> wrote: > > Hi Luca, > > Apologies. I was able to make a PoC ok but I ran into lots of > difficulties trying to integrate it into orient code (mainly to do with > working out how not break binary compaitibility with the existing binary > protocol). Then work went insanely busy for a while and I haven't got back > to it yet. What sort of timelines are looking at for v2.0? I know you are > keen to get something like this into that version and obviously if you use > my work your devs will need plenty of time to review and tweak it. If you > can tell me how long you have then I can give you an idea whether I think I > can realistically deliver or not. I would like to do this but I don't want > to leave you waiting for me if you have resources that could do it sooner. > > regards, > > Steve > > > > On 18/03/14 15:09, Luca Garulli wrote: > > Hi Steve, > have you had the chance to play with this? Any updates? > > Lvc@ > > > > On 21 February 2014 19:01, Luca Garulli <[email protected]> wrote: > > On 20 February 2014 13:24, Steve <[email protected]> wrote: > > Hi Andrey, > > I forked orient-core today and spent most of the day playing around with > the source trying to work out how to change over my pseudo schema, > property, type classes into OSchema, OProperty, OType. > ORecordSerializerDocument2Binary was very useful for understanding things. > Is it actually in use? I can't find any references to it. > > > AFAIK it's not used. It was just a prototype. > > > Could you explain *"We have many third party drivers for binary protocol"*a > bit more? Are there any examples? > > > All the binary drivers manage directly the current serialization. The > content is sent in binary for to the client and it has to unmarshall. To > all the binary drivers implemented it. > > At the beginning we could marshall the content in old form when we send > it to the clients, based on client protocol version. > > > I also have a question about ORID and whether it can be considered fixed > length. It contains OClusterPosition which has two implementations. One > is 8 bytes long and the other is 24 bytes long. For the purposes of > serialization we can't consider the ORID to be fixed length unless we > guaruntee that every instance of ORID within a DB is only one of these > implementations. Is this the case? > > > Consider it as fixed length, the longer is not yet used. > > > At the moment I'm also wrestling with what to do about null fixed length > fields and whether to reserves space inside a record. Whilst headers are > ordered by schema_fixed_length, schema_variable_length, schema_less fields > there's no reason data needs to follow the same order. But by default it > probably would. Consider an object schema like this: > class SomeClass { > update_time: DateTime //fixed length > short_string: String > massive_string: String > } > > If we first write the record and update_time is null we'd have something > like this > update_time:0 bytes|short_string: 10 bytes|massive_string:100kbytes > > Then we update it to add update_time we have a few options. > 1/ When originally writing the object reserve space even though the value > is null (wasted space) > 2/ Search for a hole. e.g. if short_string has been set to null we could > steal it's space. > 3/ Write the update_time field after massive_string (If there is space > before the beginning of the next record). Potentially we are writing into a > different disk block so for future reads when we aren't interested in > massive_string we still have to load the block into memory) > 4/ Rewrite the entire record. > > I suppose it is worth considering whether there's a benefit to reserving > partial holes. i.e. if we have 10 * 4 byte nullable fixed length fields > (all null on initial write) should we take a guess and reserve say 10 out > of the 40 possible bytes for future updates? But I'm probably getting > ahead of myself. I'll work on a simple implementation first before trying > to be too clever ;) > > > Good question. > > I think that reserving space for fixed length fields has the advantage > to keep the fixed size area as is and fixed length fields are usually > small, maximum 8 bytes each. > > By the way, datetime now are stored as long, so probably a -1 could > means NULL. We should figure out how to represent NULL on each type. > > Lvc@ > > > > On 20/02/14 20:12, Andrey Lomakin wrote: > > Hi Steve, > Good that you are going to help us. > Few additional information: > 1. We already have binary serialization support you can see it here > com.orientechnologies.common.serialization.types.OBinarySerializer so > obviously we should not have several version of the same. Also I think it > will be interesting for you to look at this issue and discussion here > https://github.com/orientechnologies/orientdb/issues/681#issuecomment-28466948. > We discussed serialization of single record (sorry had no time to analyze > it deeply because a lot of events) but in case of SQL query you have to > process millions of them. > 2. We are working on binary compatibility mechanics too (I mean > compatibility between storage formats), without it current users will not > be able to accomplish new features especially binary serialization. > 3. We have many third party drivers for binary protocol (which pass > serialized records on client;s side) so we have to think how to not break > functionality of this drivers. > > > > On Wed, Feb 19, 2014 at 1:53 PM, Steve <[email protected]> wrote: > > Hi Luca, > > I'll give it a go with the real ODB code. The reason I didn't is because > I'm actually quite new to ODB even as an end user but your instructions > will set me in the right direction. Most of my experience with data > serialization formats has been with Bitcoin which was mostly for network > protocol use cases rather than big-data storage. But that was also a high > performance scenario so I guess there are a lot of parallels. > > > On 19/02/14 > > ... -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
