[orientdb] Differing record serialization formats in binary protocol?

Michael Peterson Sat, 14 Feb 2015 05:53:26 -0800

Hello,

I am continuing to work on a Go client and trying to implement the Network 
Binary Protocol, but I've hit another server response I don't understand.


I am doing a query using REQUEST_COMMAND (synchronous) and querying a 
Document (not using Graphs yet).  When I query for the full document the 
serialized record seems to be of a slightly different format than when I 
query for a field of the record.

For background, here's the query in the Java shell:

    orientdb {db=cars}> select * from Carz;   
    ----+-----+------+---------+------
    #   |@RID |@CLASS|make     |model 
    ----+-----+------+---------+------
    0   |#13:0|Carz  |Honda    |Accord
    1   |#13:1|Carz  |Chevrolet|Tahoe 
    ----+-----+------+---------+------
    
    2 item(s) found. Query executed in 0.006 sec(s).
    orientdb {db=cars}> select make from Carz;
    ----+------+---------
    #   |@CLASS|make     
    ----+------+---------
    0   |null  |Honda    
    1   |null  |Chevrolet
    ----+------+---------


When I do a REQUEST_COMMAND query with the query "select * from Carz" I get 
back a serialized record that does not quite follow the Schemaless 
Serialization (
https://raw.githubusercontent.com/wiki/orientechnologies/orientdb/Record-Schemaless-Binary-Serialization.md).
  
Instead I get back the "alternative" serialization format I outlined in my 
previous posting: 
https://groups.google.com/d/msg/orient-database/IDItY72Ze6U/pP4lgfT8S1UJ

But when I do a REQUEST_COMMAND query with the query that selects a field 
of the record: "select make from Carz", I now get back a serialized record 
that looks like it exactly matches the documented Schemaless Serialization, 
rather than the "alternative" serialization format (I don't know what else 
to call it, since it appears to be undocumented?)  

Why is there an inconsistency?  It's unclear to me what is going on.

Here's the breakdown with what the server is sending back:

    For query: select * from Carz

    Read 39 bytes:  q select * from Carz���� [OChannelBinaryServer]
    Writing byte (1 byte): 0 [OChannelBinaryServer]   -> 
status                                      
    Writing int (4 bytes): 87 [OChannelBinaryServer]  -> 
session                                     
    Writing byte (1 byte): 108 [OChannelBinaryServer] -> result-type: 'l' 
(Collection)               
    Writing int (4 bytes): 2 [OChannelBinaryServer]   -> result-set-size: 
2                          
    Writing short (2 bytes): 0 [OChannelBinaryServer] -> short=0 (means 
"record", not null or "RID") 
    Writing byte (1 byte): 100 [OChannelBinaryServer] -> record-type = 
'd'                           
    Writing short (2 bytes): 13 [OChannelBinaryServer]-> cluster-id (13)
    Writing long (8 bytes): 0 [OChannelBinaryServer]  -> cluster-pos , so 
rid is #13:0                                
    Writing int (4 bytes): 1 [OChannelBinaryServer]   -> 
record-version                              
    Writing bytes (4+30=34 bytes): 
     [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 23, 0, 10, 72, 
111, 110, 100, 97, 12, 65, 99, 99, 111, 114, 100] [OChannelBinaryServer]
    Writing short (2 bytes): 0 [OChannelBinaryServer]
    Writing byte (1 byte): 100 [OChannelBinaryServer]
    Writing short (2 bytes): 13 [OChannelBinaryServer]
    Writing long (8 bytes): 1 [OChannelBinaryServer]
    Writing int (4 bytes): 1 [OChannelBinaryServer]
    Writing bytes (4+33=37 bytes): 
     [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 27, 0, 18, 67, 
104, 101, 118, 114, 111, 108, 101, 116, 10, 84, 97, 104, 111, 101] 
[OChannelBinaryServer]
    Writing byte (1 byte): 0 [OChannelBinaryServer]
    Flush [OChannelBinaryServer]

    
Analyzing the first serialized record, this is the "alternative" serialized 
format:

   Version
   |---|-----Classname------|--------------Header-----------------| ...
        len |---- string ---| PID <----ptr--> PID <----ptr---> EOH
         4   C  a    r    z       n                               
     [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 23, 0,
idx:  0  1   2   3    4    5   6  7  8  9  10  11 12 13 14  15 16 

   |---------------------------Data-------------------------| 
   |len |-------string------| len |---------string----------| 
     5   H   o    n    d   a       A   c   c    o    r    d   
    10, 72, 111, 110, 100, 97, 12, 65, 99, 99, 111, 114, 100] 
    17  18                 22  23  24                     29  


The header here is the "alternative" one - instead of reguilar zigzag 
encoding it uses the formala:

    zigzagEncode( (fieldId+1) * -1 )
    
to encode the Property/field ID, and does not include the name of the 
Property/field.



Compare that to:
    
    select make from Carz
    Read 42 bytes:  q select make from Carz���� [OChannelBinaryServer]
    Writing byte (1 byte): 0 [OChannelBinaryServer]    -> status
    Writing int (4 bytes): 85 [OChannelBinaryServer]   -> session
    Writing byte (1 byte): 108 [OChannelBinaryServer]  -> result-type: 'l' 
(Collection)
    Writing int (4 bytes): 2 [OChannelBinaryServer]    -> result-set-size: 2
    Writing short (2 bytes): 0 [OChannelBinaryServer]  -> short=0 (means 
"record", not null or "RID")
    Writing byte (1 byte): 100 [OChannelBinaryServer]  -> record-type = 'd' 
    Writing short (2 bytes): -2 [OChannelBinaryServer] -> cluster-id -2 => 
means ????
    Writing long (8 bytes): 0 [OChannelBinaryServer]   -> cluster-pos ??
    Writing int (4 bytes): 0 [OChannelBinaryServer]    -> 
record-version                              
    Writing bytes (4+19=23 bytes): 
        V  ?  4   m    a    k    e <----ptr--->  ?  ?  5   H    o    n    
d   a
       [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 10, 72, 111, 110, 
100, 97] [OChannelBinaryServer]
    idx 0                   5                10        13 
    Writing short (2 bytes): 0 [OChannelBinaryServer]
    Writing byte (1 byte): 100 [OChannelBinaryServer]
    Writing short (2 bytes): -2 [OChannelBinaryServer]
    Writing long (8 bytes): 1 [OChannelBinaryServer]    -> cluster-pos ??
    Writing int (4 bytes): 0 [OChannelBinaryServer]
    Writing bytes (4+23=27 bytes): 
      [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 18, 67, 104, 101, 
118, 114, 111, 108, 101, 116] [OChannelBinaryServer]
    Writing byte (1 byte): 0 [OChannelBinaryServer]
    Flush [OChannelBinaryServer]
    

Analyzing the first serialized record, this looks like the serialization 
format documented at:
https://raw.githubusercontent.com/wiki/orientechnologies/orientdb/Record-Schemaless-Binary-Serialization.md

   |-|--|--------------- Header -----------------|---------- Data ---------|
    V CN  4   m    a    k    e <----ptr--->  ?  ?  5   H    o    n    d   a
   [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 10, 72, 111, 110, 100, 
97]
idx 0  1  2   3              6  7        10 11 12  13  14                 18
    
    
idx 0   : serialization version (0)
idx 1   : classname => string of length 0, no classname
idx 2   : "normally" encoded varint (rather than the strange version used 
for the full Document), sz = 4
idx 3-6 : bytes for "make" (the field name)
idx 7-10: int - ptr to data
idx 11  : data_type (7 = string)
idx 12  : ? end of header, I guess
idx 13  : "normally" encoded varint, sz = 5
idx 14-18: bytes for "Honda", the value for field "make"

So here the Header uses regular zigzag encoding and gives the field name, 
NOT the field id.


So to summarize my questions:

* is my interpretation of the serialized formats above correct?
* why are we using two different serialization formats?
* how is my driver to know which serialization format is being returned?  
The only difference is the cluster-id is -2.  Is that the indicator?  I 
haven't located any documentation as to what that value means.
* In the previous discussion, Emanuel said "we probably will remove the 
schemafull serialization over network soon."  Is one of these the 
"schemafull" schema?

Thanks again for you help,
-Michael

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Differing record serialization formats in binary protocol?

Reply via email to