Re: [orientdb] Confusion over variable size integer encoding in binary protocol

Michael Peterson Mon, 02 Feb 2015 04:52:47 -0800

Thanks Emanuel.

It looks like the Google Protobuf spec 
(https://developers.google.com/protocol-buffers/docs/encoding?csw=1) uses 
*little-endian* encoding for both varints *and* non-varint numbers.


Can you clarify what the endian-ness is for the network binary protocol and 
the schemaless serialization in OrientDB?

>From my (limited) experimentation so far, it looks like the network binary 
protocol uses big-endian.  What about float types in the schemaless 
serialization?

-Michael

On Sunday, February 1, 2015 at 3:53:37 PM UTC-5, Emanuele wrote:
>
>  Hi, 
>
> Sorry the documentation was not perfect updating it right now, the UTF-8 
> way was an experiment, but the protocol it use the varint encoding as the 
> protobuf spec say :),
>
> bye 
> Emanuel
>
> On 31/01/15 03:19, Michael Peterson wrote:
>  
> Hi,
>
> I've just recently started an initial effort to write a Go (golang) driver 
> for OrientDB. I'm starting with the binary protocol and I have some 
> questions.
>
> In "field_data serialization by type" section of this document: 
> http://www.orientechnologies.com/docs/last/orientdb.wiki/Record-Schemaless-Binary-Serialization.html
>  
> it states that variable size integers are "implemented in the same way of 
> UTF-8".  But the ranges then given contradict that statement:
>
> -64 < value < 64 1 byte
> -8192 < value < 8192 2 byte
> -1048576 < value < 1048576 3 byte
> -134217728 < value < 134217728 4 byte
> -17179869184 < value < 17179869184 5 byte
>
> If you are truly using UTF-8 "marker" bits, then a 2 byte varint would be 
> of the form:
>
> 110xxxxx 10yyyyyy
>
> which leaves only 11 bits free, but your range of -8192 < value < 8192 for 
> 2 bytes, implies that you have 14 bits available.  I then found this 
> reference: 
>
>
> https://groups.google.com/forum/#!searchin/orient-database/varint$20variable$20length$20int/orient-database/8r1ES_LEDxE/rwdpxjMr-BQJ
>
> which indicates that you are using the high-bit in all bytes to indicate 
> whether there is another byte (1=yes).  Again that is actually not how 
> UTF-8 works.  UTF-8 allows you to tell how many totals bytes are used by 
> parsing only the first byte only; and the subsequent bytes all only have 6 
> bits free, not 7.
>
> So the documentation should be clarified I think.  (Examples would be even 
> better!)
>
> Also two other questions:
>
> * could you also specify whether all integer types are encoded big-endian, 
> including varints?
>
> * can you confirm that you are using ZigZag encoding for varints?  The 
> first reference above says you are, but the second one (the google group 
> link) has no mention of it.  If you are using ZigZag encoding, can you 
> confirm that you using the form used in Google's Protocol Buffers as 
> documented here:  
> https://developers.google.com/protocol-buffers/docs/encoding?csw=1 ?
>
> Thanks very much for your help,
> Michael
>
>  -- 
>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>  

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Confusion over variable size integer encoding in binary protocol

Reply via email to