Re: [orientdb] Schema driven serialization #1890

stefan Mon, 07 Apr 2014 05:17:32 -0700

Thank you Steve! 

I look forward to use this or pitch in at later stages.


Regards,
  -Stefan

On Sunday, 6 April 2014 10:22:17 UTC, Steve Coughlan wrote:
>
> I've spent the last few days playing with this and I've just pushed the 
> results so far to 
> https://github.com/shadders/orientdb/tree/binary-serialization/binary
>
> It needs a lot of work to get it integrated into ODB but it's start and I 
> wanted to get it up somewhere where the developers can look at it so I can 
> start asking the question I need to ask to get it to play with Orient-core.
>
> Currently I haven't touched orient-core's code so it's all in a seperate 
> project under the 'binary' directory.  I have tried to align the classes 
> with Orient's class structure though so I can gradually integrate it.  
>
> Tomorrow I will do a proper writeup of where it's at, how I've specified 
> the format, what questions I need to ask and what barriers I've come across 
> with the orient internal API.  For now the OBinarySerializer will only work 
> serializing a document back and forth to an array.  db.save(document) 
> throws up a few problems which I need to ask questions about.  It also only 
> handles primitive type OTypes but that it not really a big deal as the 
> format is quite agnostic to how an individual field is serialized so it's 
> possibly just a matter of adapting existing serializers or building new 
> ones for a few OTypes (which doesn't look too hard).
>
> I will start with one question though.  My OBinaryDocument class inherits 
> from ODocument and most constructors match and call super(sameParams) but 
> for some reason when I save the document doesn't generate an ORID.  Problem 
> is ODocument.clusterIds is null, but I can't find how they are set.  Any 
> hints?
>
> For now the record format is documented reasonably well in the class 
> javadoc for ORecordHeader.
>
>
> On 22/03/14 09:00, [email protected] <javascript:> wrote:
>  
>  
>  More here: 
> https://www.arangodb.org/2012/07/08/collection-disk-usage-arangodb
>
> On Friday, 21 March 2014 21:55:28 UTC, [email protected] wrote: 
>
> Hi, 
>
>  I came across this and found it interesting, perhaps you will too.
>
> https://www.arangodb.org/2012/07/11/infographic-comparing-space-usage-mongodb-couchdb-arangodb
>
>  Regards,
>   -Stefán
>
> On Tuesday, 18 March 2014 06:58:08 UTC, Steve Coughlan wrote: 
>
> Sure... I'll clean it up a bit over the next few days and document a bit 
> more clearly.  Then I'll push it.  No sense trying to figure out the whole 
> API by myself when there's people who know it inside out.  Good thinking ;)
>
> On 18/03/14 15:28, Luca Garulli wrote:
>  
> Hi Steve, 
> we're still closing 1.7, so we have a few months for 2.0. What I'd like is 
> a new branch where everybody can look, contribute and compare approaches.
>
>  Could you push it on a branch in GitHub? WDYT?
>
>  Lvc@
>
>  
>
> On 18 March 2014 06:21, Steve <[email protected]> wrote:
>
>  Hi Luca,
>
> Apologies.  I was able to make a PoC ok but I ran into lots of 
> difficulties trying to integrate it into orient code (mainly to do with 
> working out how not break binary compaitibility with the existing binary 
> protocol).  Then work went insanely busy for a while and I haven't got back 
> to it yet.  What sort of timelines are looking at for v2.0?  I know you are 
> keen to get something like this into that version and obviously if you use 
> my work your devs will need plenty of time to review and tweak it.  If you 
> can tell me how long you have then I can give you an idea whether I think I 
> can realistically deliver or not.  I would like to do this but I don't want 
> to leave you waiting for me if you have resources that could do it sooner.
>
> regards,
>
> Steve 
>
>
>
> On 18/03/14 15:09, Luca Garulli wrote:
>   
>  Hi Steve, 
> have you had the chance to play with this? Any updates?
>
>  Lvc@
>
>  
>
> On 21 February 2014 19:01, Luca Garulli <[email protected]> wrote:
>
>   On 20 February 2014 13:24, Steve <[email protected]> wrote:
>
>  Hi Andrey,
>
> I forked orient-core today and spent most of the day playing around with 
> the source trying to work out how to change over my pseudo schema, 
> property, type classes into OSchema, OProperty, OType.  
> ORecordSerializerDocument2Binary was very useful for understanding things.  
> Is it actually in use?  I can't find any references to it.
>  
>
>  AFAIK it's not used. It was just a prototype.
>   
>
> Could you explain *"We have many third party drivers for binary protocol"*a 
> bit more?  Are there any examples?
>  
>
>  All the binary drivers manage directly the current serialization. The 
> content is sent in binary for to the client and it has to unmarshall. To 
> all the binary drivers implemented it.
>
>  At the beginning we could marshall the content in old form when we send 
> it to the clients, based on client protocol version.
>   
>
> I also have a question about ORID and whether it can be considered fixed 
> length.  It contains OClusterPosition which has two implementations.  One 
> is 8 bytes long and the other is 24 bytes long.  For the purposes of 
> serialization we can't consider the ORID to be fixed length unless we 
> guaruntee that every instance of ORID within a DB is only one of these 
> implementations.  Is this the case?
>  
>
>  Consider it as fixed length, the longer is not yet used.
>   
>
> At the moment I'm also wrestling with what to do about null fixed length 
> fields and whether to reserves space inside a record.  Whilst headers are 
> ordered by schema_fixed_length, schema_variable_length, schema_less fields 
> there's no reason data needs to follow the same order.  But by default it 
> probably would.  Consider an object schema like this:
> class SomeClass {
>     update_time: DateTime //fixed length
>     short_string: String
>     massive_string: String
> }
>
> If we first write the record and update_time is null we'd have something 
> like this
> update_time:0 bytes|short_string: 10 bytes|massive_string:100kbytes
>
> Then we update it to add update_time we have a few options.
> 1/ When originally writing the object reserve space even though the value 
> is null (wasted space)
> 2/ Search for a hole.  e.g. if short_string has been set to null we could 
> steal it's space.
> 3/ Write the update_time field after massive_string (If there is space 
> before the beginning of the next record).  Potentially we are writing into a
> different disk block so for future reads when we aren't interested in 
> massive_string we still have to load the block into memory)
> 4/ Rewrite the entire record.
>
> I suppose it is worth considering whether there's a benefit to reserving 
> partial holes.  i.e. if we have 10 * 4 byte nullable fixed length fields 
> (all null on initial write) should we take a guess and reserve say 10 out 
> of the 40 possible bytes for future updates?  But I'm probably getting 
> ahead of myself.  I'll work on a simple implementation first before trying 
> to be too clever ;)
>
>
>  Good question.
>
>  I think that reserving space for fixed length fields has the advantage 
> to keep the fixed size area as is and fixed length fields are usually 
> small, maximum 8 bytes each.
>
>  By the way, datetime now are stored as long, so probably a -1 could 
> means NULL. We should figure out how to represent NULL on each type.
>
>  Lvc@
>   
>
>  
> On 20/02/14 20:12, Andrey Lomakin wrote:
>  
> Hi Steve, 
> Good that you are going to help us.
> Few additional information:
> 1.  We already have binary serialization support you can see it here 
> com.orientechnologies.common.serialization.types.OBinarySerializer so 
> obviously we should not have several version of the same. Also I think it 
> will be interesting for you to look at this issue and discussion here 
> https://github.com/orientechnologies/orientdb/issues/681#issuecomment-28466948.
>  We discussed serialization of single record (sorry had no time to analyze 
> it deeply because a lot of events) but in case of SQL query you have to 
> process millions of them. 
> 2.  We are working on binary compatibility mechanics too (I mean 
> compatibility between storage formats), without it current users will not 
> be able to accomplish new features especially binary serialization.
>  3.  We have many third party drivers for binary protocol (which pass 
> serialized records on client;s side) so we have to think how to not break 
> functionality of this drivers.
>
>   
>
> On Wed, Feb 19, 2014 at 1:53 PM, Steve <[email protected]> wrote:
>
>  Hi Luca,
>
> I'll give it a go with the real ODB code.  The reason I didn't is because 
> I'm actually quite new to ODB even as an end user but your instructions 
> will set me in the right direction.  Most of my experience with data 
> serialization formats has been with Bitcoin which was mostly for network 
> protocol use cases rather than big-data storage.  But that was also a high 
> performance scenario so I guess there are a lot of parallels. 
>
>
> On 19/02/14 
>
> ...

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Schema driven serialization #1890

Reply via email to