[orientdb] Re: Differing record serialization formats in binary protocol?

Christian Kramer Sat, 14 Feb 2015 23:54:07 -0800

Hey Michael, 

the -2 for the cluster-id states a "null record", see:


private static final ORecordId NULL_RECORD_ID   = new ORecordId(-2, 
ORID.CLUSTER_POS_INVALID);



Cheers, 
Chris


Am Samstag, 14. Februar 2015 21:02:56 UTC+1 schrieb Michael Peterson:
>
> Hi Chris,
>
> Thanks for the quick reply and pointer to the code - I believe that 
> answers my first and third questions
> 1. Was my interpretation of the serialization formats correct?  *Yes*
> 3. How do I tell the difference between the two forms?  The answer is 
> *whether 
> the zigzag-decoded length at the start of each header entry is positive 
> (you have a property) or negative (you have a Document).*
>
> I would still like to know what the meaning of "-2" is for the cluster-id 
> for a property - it seems to be hard coded for properties.  And the long 
> val that comes after that (cluster-pos) is not hardcoded.  Does that value 
> refer to an actual cluster-pos or is it just an incrementing counter for 
> the property "rows" being returned?
>
> And finally I'm still wondering what Emanuel meant when he said that the 
> "schemafull serialization over network" will probably be removed soon.  I 
> don't know what serialization format that is referring to?  Is it the one 
> that is being used for Properties (but not Documents)?
>
> Thanks very much,
> Michael
>
>
> On Saturday, February 14, 2015 at 10:57:20 AM UTC-5, Christian Kramer 
> wrote:
>>
>> Hey, 
>>
>> so from my point of view on the one the hand you query a record/document 
>> and on the other hand you query a property. So the result on the first 
>> query is represented as records and on the second as Property. 
>> See 
>> orientdb/core/src/main/java/com/orientechnologies/orient/core/serialization/serializer/record/binary/ORecordSerializerBinaryV0.java#deserialize
>>  
>> method
>>
>> if (len == 0) {
>>   // SCAN COMPLETED
>>   break;
>> } else if (len > 0) {
>>   // PARSE FIELD NAME
>>   fieldName = new String(bytes.bytes, bytes.offset, len, utf8);
>>   bytes.skip(len);
>>   valuePos = readInteger(bytes);
>>   type = readOType(bytes);
>> } else {
>>   // LOAD GLOBAL PROPERTY BY ID
>>   OGlobalProperty prop = getGlobalProperty(document, len);
>>   fieldName = prop.getName();
>>   valuePos = readInteger(bytes);
>>   if (prop.getType() != OType.ANY)
>>     type = prop.getType();
>>   else
>>     type = readOType(bytes);
>> }
>>
>>
>>
>>
>> Cheers, 
>> Chris
>>
>> Am Samstag, 14. Februar 2015 14:52:43 UTC+1 schrieb Michael Peterson:
>>>
>>> Hello,
>>>
>>> I am continuing to work on a Go client and trying to implement the 
>>> Network Binary Protocol, but I've hit another server response I don't 
>>> understand.
>>>
>>> I am doing a query using REQUEST_COMMAND (synchronous) and querying a 
>>> Document (not using Graphs yet).  When I query for the full document the 
>>> serialized record seems to be of a slightly different format than when I 
>>> query for a field of the record.
>>>
>>> For background, here's the query in the Java shell:
>>>
>>>     orientdb {db=cars}> select * from Carz;   
>>>     ----+-----+------+---------+------
>>>     #   |@RID |@CLASS|make     |model 
>>>     ----+-----+------+---------+------
>>>     0   |#13:0|Carz  |Honda    |Accord
>>>     1   |#13:1|Carz  |Chevrolet|Tahoe 
>>>     ----+-----+------+---------+------
>>>     
>>>     2 item(s) found. Query executed in 0.006 sec(s).
>>>     orientdb {db=cars}> select make from Carz;
>>>     ----+------+---------
>>>     #   |@CLASS|make     
>>>     ----+------+---------
>>>     0   |null  |Honda    
>>>     1   |null  |Chevrolet
>>>     ----+------+---------
>>>
>>>
>>> When I do a REQUEST_COMMAND query with the query "select * from Carz" I 
>>> get back a serialized record that does not quite follow the Schemaless 
>>> Serialization (
>>> https://raw.githubusercontent.com/wiki/orientechnologies/orientdb/Record-Schemaless-Binary-Serialization.md).
>>>   
>>> Instead I get back the "alternative" serialization format I outlined in my 
>>> previous posting: 
>>> https://groups.google.com/d/msg/orient-database/IDItY72Ze6U/pP4lgfT8S1UJ
>>>
>>> But when I do a REQUEST_COMMAND query with the query that selects a 
>>> field of the record: "select make from Carz", I now get back a serialized 
>>> record that looks like it exactly matches the documented Schemaless 
>>> Serialization, rather than the "alternative" serialization format (I don't 
>>> know what else to call it, since it appears to be undocumented?)  
>>>
>>> Why is there an inconsistency?  It's unclear to me what is going on.
>>>
>>> Here's the breakdown with what the server is sending back:
>>>
>>>     For query: select * from Carz
>>>
>>>     Read 39 bytes:  q select * from Carz���� [OChannelBinaryServer]
>>>     Writing byte (1 byte): 0 [OChannelBinaryServer]   -> 
>>> status                                      
>>>     Writing int (4 bytes): 87 [OChannelBinaryServer]  -> 
>>> session                                     
>>>     Writing byte (1 byte): 108 [OChannelBinaryServer] -> result-type: 
>>> 'l' (Collection)               
>>>     Writing int (4 bytes): 2 [OChannelBinaryServer]   -> 
>>> result-set-size: 2                          
>>>     Writing short (2 bytes): 0 [OChannelBinaryServer] -> short=0 (means 
>>> "record", not null or "RID") 
>>>     Writing byte (1 byte): 100 [OChannelBinaryServer] -> record-type = 
>>> 'd'                           
>>>     Writing short (2 bytes): 13 [OChannelBinaryServer]-> cluster-id (13)
>>>     Writing long (8 bytes): 0 [OChannelBinaryServer]  -> cluster-pos , 
>>> so rid is #13:0                                
>>>     Writing int (4 bytes): 1 [OChannelBinaryServer]   -> 
>>> record-version                              
>>>     Writing bytes (4+30=34 bytes): 
>>>      [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 23, 0, 10, 
>>> 72, 111, 110, 100, 97, 12, 65, 99, 99, 111, 114, 100] [OChannelBinaryServer]
>>>     Writing short (2 bytes): 0 [OChannelBinaryServer]
>>>     Writing byte (1 byte): 100 [OChannelBinaryServer]
>>>     Writing short (2 bytes): 13 [OChannelBinaryServer]
>>>     Writing long (8 bytes): 1 [OChannelBinaryServer]
>>>     Writing int (4 bytes): 1 [OChannelBinaryServer]
>>>     Writing bytes (4+33=37 bytes): 
>>>      [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 27, 0, 18, 
>>> 67, 104, 101, 118, 114, 111, 108, 101, 116, 10, 84, 97, 104, 111, 101] 
>>> [OChannelBinaryServer]
>>>     Writing byte (1 byte): 0 [OChannelBinaryServer]
>>>     Flush [OChannelBinaryServer]
>>>
>>>     
>>> Analyzing the first serialized record, this is the "alternative" 
>>> serialized format:
>>>
>>>    Version
>>>    |---|-----Classname------|--------------Header-----------------| ...
>>>         len |---- string ---| PID <----ptr--> PID <----ptr---> EOH
>>>          4   C  a    r    z       n                               
>>>      [0, 8, 67, 97, 114, 122, 47, 0, 0, 0, 17, 49, 0, 0, 0, 23, 0,
>>> idx:  0  1   2   3    4    5   6  7  8  9  10  11 12 13 14  15 16 
>>>
>>>    |---------------------------Data-------------------------| 
>>>    |len |-------string------| len |---------string----------| 
>>>      5   H   o    n    d   a       A   c   c    o    r    d   
>>>     10, 72, 111, 110, 100, 97, 12, 65, 99, 99, 111, 114, 100] 
>>>     17  18                 22  23  24                     29  
>>>
>>>
>>> The header here is the "alternative" one - instead of reguilar zigzag 
>>> encoding it uses the formala:
>>>
>>>     zigzagEncode( (fieldId+1) * -1 )
>>>     
>>> to encode the Property/field ID, and does not include the name of the 
>>> Property/field.
>>>
>>>
>>>
>>> Compare that to:
>>>     
>>>     select make from Carz
>>>     Read 42 bytes:  q select make from Carz���� [OChannelBinaryServer]
>>>     Writing byte (1 byte): 0 [OChannelBinaryServer]    -> status
>>>     Writing int (4 bytes): 85 [OChannelBinaryServer]   -> session
>>>     Writing byte (1 byte): 108 [OChannelBinaryServer]  -> result-type: 
>>> 'l' (Collection)
>>>     Writing int (4 bytes): 2 [OChannelBinaryServer]    -> 
>>> result-set-size: 2
>>>     Writing short (2 bytes): 0 [OChannelBinaryServer]  -> short=0 (means 
>>> "record", not null or "RID")
>>>     Writing byte (1 byte): 100 [OChannelBinaryServer]  -> record-type = 
>>> 'd' 
>>>     Writing short (2 bytes): -2 [OChannelBinaryServer] -> cluster-id -2 
>>> => means ????
>>>     Writing long (8 bytes): 0 [OChannelBinaryServer]   -> cluster-pos ??
>>>     Writing int (4 bytes): 0 [OChannelBinaryServer]    -> 
>>> record-version                              
>>>     Writing bytes (4+19=23 bytes): 
>>>         V  ?  4   m    a    k    e <----ptr--->  ?  ?  5   H    o    
>>> n    d   a
>>>        [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 10, 72, 111, 110, 
>>> 100, 97] [OChannelBinaryServer]
>>>     idx 0                   5                10        13 
>>>     Writing short (2 bytes): 0 [OChannelBinaryServer]
>>>     Writing byte (1 byte): 100 [OChannelBinaryServer]
>>>     Writing short (2 bytes): -2 [OChannelBinaryServer]
>>>     Writing long (8 bytes): 1 [OChannelBinaryServer]    -> cluster-pos ??
>>>     Writing int (4 bytes): 0 [OChannelBinaryServer]
>>>     Writing bytes (4+23=27 bytes): 
>>>       [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 18, 67, 104, 101, 
>>> 118, 114, 111, 108, 101, 116] [OChannelBinaryServer]
>>>     Writing byte (1 byte): 0 [OChannelBinaryServer]
>>>     Flush [OChannelBinaryServer]
>>>     
>>>
>>> Analyzing the first serialized record, this looks like the serialization 
>>> format documented at:
>>>
>>> https://raw.githubusercontent.com/wiki/orientechnologies/orientdb/Record-Schemaless-Binary-Serialization.md
>>>
>>>    |-|--|--------------- Header -----------------|---------- Data 
>>> ---------|
>>>     V CN  4   m    a    k    e <----ptr--->  ?  ?  5   H    o    n    
>>> d   a
>>>    [0, 0, 8, 109, 97, 107, 101, 0, 0, 0, 13, 7, 0, 10, 72, 111, 110, 
>>> 100, 97]
>>> idx 0  1  2   3              6  7        10 11 12  13  
>>> 14                 18
>>>     
>>>     
>>> idx 0   : serialization version (0)
>>> idx 1   : classname => string of length 0, no classname
>>> idx 2   : "normally" encoded varint (rather than the strange version 
>>> used for the full Document), sz = 4
>>> idx 3-6 : bytes for "make" (the field name)
>>> idx 7-10: int - ptr to data
>>> idx 11  : data_type (7 = string)
>>> idx 12  : ? end of header, I guess
>>> idx 13  : "normally" encoded varint, sz = 5
>>> idx 14-18: bytes for "Honda", the value for field "make"
>>>
>>> So here the Header uses regular zigzag encoding and gives the field 
>>> name, NOT the field id.
>>>
>>>
>>> So to summarize my questions:
>>>
>>> * is my interpretation of the serialized formats above correct?
>>> * why are we using two different serialization formats?
>>> * how is my driver to know which serialization format is being 
>>> returned?  The only difference is the cluster-id is -2.  Is that the 
>>> indicator?  I haven't located any documentation as to what that value means.
>>> * In the previous discussion, Emanuel said "we probably will remove the 
>>> schemafull serialization over network soon."  Is one of these the 
>>> "schemafull" schema?
>>>
>>> Thanks again for you help,
>>> -Michael
>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Re: Differing record serialization formats in binary protocol?

Reply via email to