[orientdb] Re: Differing record serialization formats in binary protocol?

Michael Peterson Sat, 14 Feb 2015 12:03:43 -0800

Hi Chris,

Thanks for the quick reply and pointer to the code - I believe that answers 
my first and third questions
1. Was my interpretation of the serialization formats correct?  *Yes*
3. How do I tell the difference between the two forms?  The answer is *whether 
the zigzag-decoded length at the start of each header entry is positive 
(you have a property) or negative (you have a Document).*


I would still like to know what the meaning of "-2" is for the cluster-id 
for a property - it seems to be hard coded for properties.  And the long 
val that comes after that (cluster-pos) is not hardcoded.  Does that value 
refer to an actual cluster-pos or is it just an incrementing counter for 
the property "rows" being returned?

And finally I'm still wondering what Emanuel meant when he said that the 
"schemafull serialization over network" will probably be removed soon.  I 
don't know what serialization format that is referring to?  Is it the one 
that is being used for Properties (but not Documents)?

Thanks very much,
Michael


On Saturday, February 14, 2015 at 10:57:20 AM UTC-5, Christian Kramer wrote:
>
> Hey, 
>
> so from my point of view on the one the hand you query a record/document 
> and on the other hand you query a property. So the result on the first 
> query is represented as records and on the second as Property. 
> See 
> orientdb/core/src/main/java/com/orientechnologies/orient/core/serialization/serializer/record/binary/ORecordSerializerBinaryV0.java#deserialize
>  
> method
>
> if (len == 0) {
>   // SCAN COMPLETED
>   break;
> } else if (len > 0) {
>   // PARSE FIELD NAME
>   fieldName = new String(bytes.bytes, bytes.offset, len, utf8);
>   bytes.skip(len);
>   valuePos = readInteger(bytes);
>   type = readOType(bytes);
> } else {
>   // LOAD GLOBAL PROPERTY BY ID
>   OGlobalProperty prop = getGlobalProperty(document, len);
>   fieldName = prop.getName();
>   valuePos = readInteger(bytes);
>   if (prop.getType() != OType.ANY)
>     type = prop.getType();
>   else
>     type = readOType(bytes);
> }
>
>
>
>
> Cheers, 
> Chris
>
> Am Samstag, 14. Februar 2015 14:52:43 UTC+1 schrieb Michael Peterson:
>>
>> Hello,
>>
>> I am continuing to work on a Go client and trying to implement the 
>> Network Binary Protocol, but I've hit another server response I don't 
>> understand.
>>
>> I am doing a query using REQUEST_COMMAND (synchronous) and querying a 
>> Document (not using Graphs yet).  When I query for the full document the 
>> serialized record seems to be of a slightly different format than when I 
>> query for a field of the record.
>>
>> For background, here's the query in the Java shell:
>>
>>     orientdb {db=cars}> select * from Carz;   
>>     ----+-----+------+---------+------
>>     #   |@RID |@CLASS|make     |model 
>>     ----+-----+------+---------+------
>>     0   |#13:0|Carz  |Honda    |Accord
>>     1   |#13:1|Carz  |Chevrolet|Tahoe 
>>     ----+-----+------+---------+------
>>     
>>     2 item(s) found. Query executed in 0.006 sec(s).
>>     orientdb {db=cars}> select make from Carz;
>>     ----+------+---------
>>     #   |@CLASS|make     
>>     ----+------+---------
>>     0   |null  |Honda    
>>     1   |null  |Chevrolet
>>     ----+------+---------
>>
>>
>> When I do a REQUEST_COMMAND query with the query "select * from Carz" I 
>> get back a serialized record that does not quite follow the Schemaless 
>> Serialization (
>> https://raw.githubusercontent.com/wiki/orientechnologies/orientdb/Record-Schemaless-Binary-Serialization.md).
>>   
>> Instead I get back the "alternative" serialization format I outlined in my 
>> previous posting: 
>> https://groups.google.com/d/msg/orient-database/IDItY72Ze6U/pP4lgfT8S1UJ
>>
>> But when I do a REQUEST_COMMAND query with the query that selects a field 
>> of the record: "select make from Carz", I now get back a serialized record 
>> that looks like it exactly matches the documented Schemaless Serialization, 
>> rather than the "alternative" serialization format (I don't know what else 
>> to call it, since it appears to be undocumented?)  
>>
>> Why is there an inconsistency?  It's unclear to me what is going on.
>>
>> Here's the breakdown with what the server is sending back:
>>
>>     For query: select * from Carz
>>
>>     Read 39 bytes:  q select * from Carz���� [OChannelBinaryServer]
>>     Writing byte (1 byte): 0 [OChannelBinaryServer]   -> 
>> status                                      
>>     Writing int (4 bytes): 87 [OChannelBinaryServer]  -> 
>> session                                     
>>     Writing byte (1 byte): 108 [OChannelBinaryServer] -> result-type: 'l' 
>> (Collection)               
>>     Writing int (4 bytes): 2 [OChannelBinaryServer]   -> result-set-size: 
>> 2                          
>>     Writing short (2 bytes): 0 [OChannelBinaryServer] -> short=0 (means 
>> "record", not null or "RID") 
>>     Writing byte (1 byte): 100 [OChannelBinaryServer] -> record-type = 
>> 'd'                           
>>     Writing short (2 bytes): 13 [OChannelBinaryServer]-> cluster-id (13)
>>     Writing long (8 bytes): 0 [OChannelBinaryServer]  -> cluster-pos , so 
>> rid is #13:0                                
>>     Writing int (4 bytes): 1 [OChannelBinaryServer]   -> 
>> record-version                              
>>     Writing bytes (4+30=34 bytes): 
>>      [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 23, 0, 10, 
>> 72, 111, 110, 100, 97, 12, 65, 99, 99, 111, 114, 100] [OChannelBinaryServer]
>>     Writing short (2 bytes): 0 [OChannelBinaryServer]
>>     Writing byte (1 byte): 100 [OChannelBinaryServer]
>>     Writing short (2 bytes): 13 [OChannelBinaryServer]
>>     Writing long (8 bytes): 1 [OChannelBinaryServer]
>>     Writing int (4 bytes): 1 [OChannelBinaryServer]
>>     Writing bytes (4+33=37 bytes): 
>>      [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 27, 0, 18, 
>> 67, 104, 101, 118, 114, 111, 108, 101, 116, 10, 84, 97, 104, 111, 101] 
>> [OChannelBinaryServer]
>>     Writing byte (1 byte): 0 [OChannelBinaryServer]
>>     Flush [OChannelBinaryServer]
>>
>>     
>> Analyzing the first serialized record, this is the "alternative" 
>> serialized format:
>>
>>    Version
>>    |---|-----Classname------|--------------Header-----------------| ...
>>         len |---- string ---| PID <----ptr--> PID <----ptr---> EOH
>>          4   C  a    r    z       n                               
>>      [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 23, 0,
>> idx:  0  1   2   3    4    5   6  7  8  9  10  11 12 13 14  15 16 
>>
>>    |---------------------------Data-------------------------| 
>>    |len |-------string------| len |---------string----------| 
>>      5   H   o    n    d   a       A   c   c    o    r    d   
>>     10, 72, 111, 110, 100, 97, 12, 65, 99, 99, 111, 114, 100] 
>>     17  18                 22  23  24                     29  
>>
>>
>> The header here is the "alternative" one - instead of reguilar zigzag 
>> encoding it uses the formala:
>>
>>     zigzagEncode( (fieldId+1) * -1 )
>>     
>> to encode the Property/field ID, and does not include the name of the 
>> Property/field.
>>
>>
>>
>> Compare that to:
>>     
>>     select make from Carz
>>     Read 42 bytes:  q select make from Carz���� [OChannelBinaryServer]
>>     Writing byte (1 byte): 0 [OChannelBinaryServer]    -> status
>>     Writing int (4 bytes): 85 [OChannelBinaryServer]   -> session
>>     Writing byte (1 byte): 108 [OChannelBinaryServer]  -> result-type: 
>> 'l' (Collection)
>>     Writing int (4 bytes): 2 [OChannelBinaryServer]    -> 
>> result-set-size: 2
>>     Writing short (2 bytes): 0 [OChannelBinaryServer]  -> short=0 (means 
>> "record", not null or "RID")
>>     Writing byte (1 byte): 100 [OChannelBinaryServer]  -> record-type = 
>> 'd' 
>>     Writing short (2 bytes): -2 [OChannelBinaryServer] -> cluster-id -2 
>> => means ????
>>     Writing long (8 bytes): 0 [OChannelBinaryServer]   -> cluster-pos ??
>>     Writing int (4 bytes): 0 [OChannelBinaryServer]    -> 
>> record-version                              
>>     Writing bytes (4+19=23 bytes): 
>>         V  ?  4   m    a    k    e <----ptr--->  ?  ?  5   H    o    n    
>> d   a
>>        [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 10, 72, 111, 110, 
>> 100, 97] [OChannelBinaryServer]
>>     idx 0                   5                10        13 
>>     Writing short (2 bytes): 0 [OChannelBinaryServer]
>>     Writing byte (1 byte): 100 [OChannelBinaryServer]
>>     Writing short (2 bytes): -2 [OChannelBinaryServer]
>>     Writing long (8 bytes): 1 [OChannelBinaryServer]    -> cluster-pos ??
>>     Writing int (4 bytes): 0 [OChannelBinaryServer]
>>     Writing bytes (4+23=27 bytes): 
>>       [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 18, 67, 104, 101, 
>> 118, 114, 111, 108, 101, 116] [OChannelBinaryServer]
>>     Writing byte (1 byte): 0 [OChannelBinaryServer]
>>     Flush [OChannelBinaryServer]
>>     
>>
>> Analyzing the first serialized record, this looks like the serialization 
>> format documented at:
>>
>> https://raw.githubusercontent.com/wiki/orientechnologies/orientdb/Record-Schemaless-Binary-Serialization.md
>>
>>    |-|--|--------------- Header -----------------|---------- Data 
>> ---------|
>>     V CN  4   m    a    k    e <----ptr--->  ?  ?  5   H    o    n    d   
>> a
>>    [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 10, 72, 111, 110, 100, 
>> 97]
>> idx 0  1  2   3              6  7        10 11 12  13  14                 
>> 18
>>     
>>     
>> idx 0   : serialization version (0)
>> idx 1   : classname => string of length 0, no classname
>> idx 2   : "normally" encoded varint (rather than the strange version used 
>> for the full Document), sz = 4
>> idx 3-6 : bytes for "make" (the field name)
>> idx 7-10: int - ptr to data
>> idx 11  : data_type (7 = string)
>> idx 12  : ? end of header, I guess
>> idx 13  : "normally" encoded varint, sz = 5
>> idx 14-18: bytes for "Honda", the value for field "make"
>>
>> So here the Header uses regular zigzag encoding and gives the field name, 
>> NOT the field id.
>>
>>
>> So to summarize my questions:
>>
>> * is my interpretation of the serialized formats above correct?
>> * why are we using two different serialization formats?
>> * how is my driver to know which serialization format is being returned?  
>> The only difference is the cluster-id is -2.  Is that the indicator?  I 
>> haven't located any documentation as to what that value means.
>> * In the previous discussion, Emanuel said "we probably will remove the 
>> schemafull serialization over network soon."  Is one of these the 
>> "schemafull" schema?
>>
>> Thanks again for you help,
>> -Michael
>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Re: Differing record serialization formats in binary protocol?

Reply via email to