Re: [orientdb] Re: REQUEST_RECORD_LOAD - sequential client implementation not possible?

GoorMoon Fri, 12 Dec 2014 08:52:07 -0800

What if you use byte array to represent and manipulate variable-size 
integers ?


On Friday, December 12, 2014 4:56:57 PM UTC+2, mindplay.dk wrote:
>
> Yikes, I blew my entire day on this.
>
> Can't find a PHP implementation, can't get a port to work, because PHP 
> only has one numeric type, and it's a 32-bit signed integer.
>
> What's worse, it's platform-dependent and could be either 32-bit or 64-bit.
>
> I'm afraid we're at a dead end with this client, unless somebody else can 
> figure out how to read/write variable-size integers, or unless OrientDB 
> offers a protocol option for clients in languages that don't have proper 
> support for native numerical types... man, PHP stinks :-(
>
>
> On Fri, Dec 12, 2014 at 12:53 PM, Rasmus Schultz <[email protected] 
> <javascript:>> wrote:
>>
>> I know, but that's hardly a specification - not enough to reference for 
>> an implementation.
>>
>> For now, I will try to port your implementation...
>>
>>
>> On Fri, Dec 12, 2014 at 12:49 PM, GoorMoon <[email protected] 
>> <javascript:>> wrote:
>>>
>>> You can look hear 
>>> https://github.com/orientechnologies/orientdb/wiki/Record-Schemaless-Binary-Serialization#shortintegerlong
>>>
>>>
>>> On Friday, December 12, 2014 1:44:01 PM UTC+2, mindplay.dk wrote:
>>>>
>>>> Glad to hear that, thanks :-)
>>>>
>>>> So on a related note - the "varint" type used in the OrientDB binary 
>>>> protocol, what specification does it follow precisely? Because apparently 
>>>> there are lots 
>>>> <http://vpri.org/fonc_wiki/index.php/Variable_Length_Integer> of ways 
>>>> to encode a variable-size integer.
>>>>
>>>> I sort of wish there was an option for the client to disable 
>>>> variable-length integers in the protocol, instead encoding them with a 
>>>> fixed size.
>>>>
>>>> I can implement UTF-8 style reading/writing of variable-size integers 
>>>> in PHP, but this is going to add considerable CPU overhead - in the case 
>>>> of 
>>>> a PHP client (probably other scripting languages too) a small amount of 
>>>> bandwidth overhead is likely preferable to CPU overhead. What we want is a 
>>>> fast client - whether that means using a little more bandwidth is probably 
>>>> secondary, as is the ability to support more than 2 billion records for 
>>>> most projects.
>>>>
>>>> Just putting that out there :-)
>>>>
>>>> But for the time being, can you point me to a specification or (better) 
>>>> a reference implementation (in any language) of the VLI encoding used by 
>>>> OrientDB?
>>>>
>>>> I can reference the one in GoorMoon's .NET driver 
>>>> <https://github.com/orientechnologies/OrientDB-NET.binary/blob/binary.serialization/src/Orient/Orient.Client/Protocol/Serializers/RecordBinarySerializer.cs#L502>,
>>>>  
>>>> or the one on wikipedia <http://en.wikipedia.org/wiki/UTF-8>, but 
>>>> neither of them appear to have tests, and I'm unsure how to test them. Go 
>>>> has a nice implementation 
>>>> <http://golang.org/src/encoding/binary/varint.go> with tests 
>>>> <http://golang.org/src/encoding/binary/varint_test.go>, but since 
>>>> there are so many types of VLI which don't appear to have any official 
>>>> names or standardization, I can't be sure it's the same type of encoding...
>>>>
>>>>
>>>> On Fri, Dec 12, 2014 at 11:45 AM, Luca Garulli <[email protected]> 
>>>> wrote:
>>>>>
>>>>> Authors of drivers have such kind of high priority on requests :-)
>>>>>
>>>>> Lvc@
>>>>>
>>>>>
>>>>> On 11 December 2014 at 23:29, GoorMoon <[email protected]> wrote:
>>>>>
>>>>>> Glad to hear !!!
>>>>>>
>>>>>> On Thursday, December 11, 2014 11:41:05 PM UTC+2, mindplay.dk wrote:
>>>>>>>
>>>>>>> Never mind, spotted it :-)
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 11, 2014 at 10:03 PM, Rasmus Schultz <[email protected]
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> That was a fast decision - very happy to see them reacting to this 
>>>>>>>> issue so quickly and slating it for the 2.0 release! :-)
>>>>>>>>
>>>>>>>> So I'm referencing a bunch of your code for my client now, and I'm 
>>>>>>>> hung up on a small issue, maybe you can point me in the right 
>>>>>>>> direction...
>>>>>>>>
>>>>>>>> The so-called "varint" type in the binary serialization - I 
>>>>>>>> understand it's a variable-size integer, encoded like UTF-8 character 
>>>>>>>> codes. Where or how do you handle this in your implementation?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 11, 2014 at 7:31 PM, Rasmus Schultz <[email protected]
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Posted here
>>>>>>>>>
>>>>>>>>> https://github.com/orientechnologies/orientdb/issues/3175
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Dec 11, 2014 at 7:10 PM, GoorMoon <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I am agree with you, but this not our decision.
>>>>>>>>>> I suggest you to open issue here https://github.com/orientechno
>>>>>>>>>> logies/orientdb that describe your purpose.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thursday, December 11, 2014 10:55:46 AM UTC+2, mindplay.dk 
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> This looks great, thanks! So much simpler than the CSV 
>>>>>>>>>>> serializer.
>>>>>>>>>>>
>>>>>>>>>>> I see that you do have to buffer the record in memory still, as 
>>>>>>>>>>> I suspected. I really do wish they would make the change I 
>>>>>>>>>>> suggested below, 
>>>>>>>>>>> putting the record format before the actual record - I think then 
>>>>>>>>>>> you 
>>>>>>>>>>> wouldn't need to buffer records in memory before you can 
>>>>>>>>>>> deserialize. 
>>>>>>>>>>> Thoughts?
>>>>>>>>>>> On Dec 10, 2014 1:09 PM, "GoorMoon" <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey,
>>>>>>>>>>>> I don't know about PHP driver but, i contribute to .NET Driver 
>>>>>>>>>>>> https://github.com/orientechnologies/OrientDB-NET.binary
>>>>>>>>>>>> i implemented a lot of features of Binary Serializer, and may 
>>>>>>>>>>>> help you with your question.
>>>>>>>>>>>> About RECORD_LOAD i don't have any problem and get document in 
>>>>>>>>>>>> binary format.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tuesday, December 9, 2014 6:15:13 PM UTC+2, mindplay.dk 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> We are well aware that this is no small undertaking, but we 
>>>>>>>>>>>>> believe in OrientDB and we think it's worthwhile.
>>>>>>>>>>>>>
>>>>>>>>>>>>> > one big step is actually implement the serialize/deserialize 
>>>>>>>>>>>>>  of hte document correctly from the binary serialization
>>>>>>>>>>>>>
>>>>>>>>>>>>> To my knowledge, that has not been done in php yet, by anyone? 
>>>>>>>>>>>>> All existing implementations, including the fork by Ostico, 
>>>>>>>>>>>>> support only 
>>>>>>>>>>>>> the CSV style serialization. The binary serialization format 
>>>>>>>>>>>>> actually ought 
>>>>>>>>>>>>> to be a lot easier to implement, as it won't require a state 
>>>>>>>>>>>>> machine/parser 
>>>>>>>>>>>>> like the CSV format - and also should be a lot more CPU friendly, 
>>>>>>>>>>>>> memory 
>>>>>>>>>>>>> efficient, and less bandwidth overhead, so we're targeting that 
>>>>>>>>>>>>> exclusively.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We're also targeting the most recent protocol, which already 
>>>>>>>>>>>>> differs substantially from what we were able to reference from 
>>>>>>>>>>>>> existing 
>>>>>>>>>>>>> implementations, which are based on older versions of the 
>>>>>>>>>>>>> protocol. We hope 
>>>>>>>>>>>>> to support the final version of the protocol when OrientDB 2.0 is 
>>>>>>>>>>>>> released 
>>>>>>>>>>>>> - we do not want this client library to only support a legacy 
>>>>>>>>>>>>> protocol from 
>>>>>>>>>>>>> inception.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As said though, it doesn't appear that REQUEST_RECORD_LOAD 
>>>>>>>>>>>>> respects the serializer setting - it appears to always return 
>>>>>>>>>>>>> records in 
>>>>>>>>>>>>> the CSV format. If this a missing feature or server-side issue, 
>>>>>>>>>>>>> we won't 
>>>>>>>>>>>>> get very far with our client anytime soon... Either way, we need 
>>>>>>>>>>>>> someone 
>>>>>>>>>>>>> who can at least answer the question and help set us on the right 
>>>>>>>>>>>>> path.
>>>>>>>>>>>>>
>>>>>>>>>>>>> At this moment, we are stalled, since we don't even know if 
>>>>>>>>>>>>> the server is behaving correctly, or whether we need to support 
>>>>>>>>>>>>> the CSV 
>>>>>>>>>>>>> format or not.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Dec 9, 2014 at 4:57 PM, Emanuel <emanuele.t...@gmail.
>>>>>>>>>>>>> com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>  We have also other drivers php this one https://github.com/
>>>>>>>>>>>>>> orientechnologies/php-orientdb that also already have a few 
>>>>>>>>>>>>>> forks (example this : https://github.com/Ostico/PhpOrient ).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i would like to say that is better have less drivers more 
>>>>>>>>>>>>>> update and i warn you, write a driver from scratch is not so 
>>>>>>>>>>>>>> easy as it 
>>>>>>>>>>>>>> seems :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> anyway you are free to do so ;)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> one big step is actually implement the serialize/deserialize  
>>>>>>>>>>>>>> of hte document correctly from the binary serialization, that is 
>>>>>>>>>>>>>> quite 
>>>>>>>>>>>>>> complex and can be also target of evolution/optimization in not 
>>>>>>>>>>>>>> to far 
>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here in orient we are evaluating to give an easier way to 
>>>>>>>>>>>>>> read/write the document on the binary protocol, but i will open 
>>>>>>>>>>>>>> another 
>>>>>>>>>>>>>> thread on this :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> bye 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Emanuel
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 12/09/2014 11:48 AM, Rasmus Schultz wrote:
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> Doctrine is the only one of those projects that still have 
>>>>>>>>>>>>>> any traction - and it's a full scale data mapper, what we need 
>>>>>>>>>>>>>> is a simple 
>>>>>>>>>>>>>> driver/client. 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  We are of course referencing those projects for lots of 
>>>>>>>>>>>>>> implementation details, but we're shooting for something much 
>>>>>>>>>>>>>> simpler and 
>>>>>>>>>>>>>> more low-level, something people can use to build their own 
>>>>>>>>>>>>>> mappers/DAO/AR 
>>>>>>>>>>>>>> implementations on top of.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  We're also designing the whole thing using very basic OOP 
>>>>>>>>>>>>>> patterns (no traits) in the hopes of porting this to a native 
>>>>>>>>>>>>>> extension 
>>>>>>>>>>>>>> (e.g. Zephir) eventually.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  We're also designing the whole thing with zero dependencies 
>>>>>>>>>>>>>> on other libraries.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  So we have somewhat different objectives from the other 
>>>>>>>>>>>>>> projects, and more of a minimalist mindset, I think.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>> On Tue, Dec 9, 2014 at 12:36 PM, 'Curtis Mosters' via 
>>>>>>>>>>>>>> OrientDB <[email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Well there is no other Google Group. But why not use the 
>>>>>>>>>>>>>>> Github already existing PHP OrientDB projects?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://github.com/AntonTerekhov/OrientDB-PHP
>>>>>>>>>>>>>>> https://github.com/doctrine/orientdb-odm
>>>>>>>>>>>>>>> https://packagist.org/packages/orientdb-php/orientdb-php
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't know but this would be way better to do it there. 
>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am Dienstag, 9. Dezember 2014 11:38:07 UTC+1 schrieb 
>>>>>>>>>>>>>>> mindplay.dk: 
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> Is there a different group for developers with more 
>>>>>>>>>>>>>>>> technical questions? 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  I want to help bring OrientDB to php - is this the right 
>>>>>>>>>>>>>>>> place for that? Or is nobody interested?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Monday, December 8, 2014 5:19:07 PM UTC+1, mindplay.dk 
>>>>>>>>>>>>>>>> wrote: 
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm trying to tackle REQUEST_RECORD_LOAD as the first 
>>>>>>>>>>>>>>>>> useful function in my PHP client. (I have the basics like 
>>>>>>>>>>>>>>>>> connect and open, 
>>>>>>>>>>>>>>>>> error handling, etc. working so far.)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  This being a PHP client, one major concern for me, is to 
>>>>>>>>>>>>>>>>> avoid parsing (with a state machine, as was necessary with 
>>>>>>>>>>>>>>>>> the old format) 
>>>>>>>>>>>>>>>>> since this is extremely inefficient in PHP - this is one 
>>>>>>>>>>>>>>>>> reason I'm 
>>>>>>>>>>>>>>>>> targeting OrientDB 2.0 and the new binary format exclusively, 
>>>>>>>>>>>>>>>>> as this 
>>>>>>>>>>>>>>>>> appears to make that possible (?)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Unfortunately, the response format 
>>>>>>>>>>>>>>>>> of REQUEST_RECORD_LOAD itself appears to make that impossible.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  [(payload-status:byte)[(record-content:bytes)(record-
>>>>>>>>>>>>>>>>> version:int)(record-type:byte)]*]+
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  In order to read sequentially over "record-content", I 
>>>>>>>>>>>>>>>>> need to know the "record-type" in advance, so the order of 
>>>>>>>>>>>>>>>>> this data 
>>>>>>>>>>>>>>>>> appears to be wrong? I believe the record format of each 
>>>>>>>>>>>>>>>>> payload chunk 
>>>>>>>>>>>>>>>>> would need to backwards, basically:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  [(payload-status:byte)[(record-type:byte)(record-version:
>>>>>>>>>>>>>>>>> int)(record-content:bytes)]*]+
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Otherwise, I am forced to load the whole record-content 
>>>>>>>>>>>>>>>>> into memory first, before I can know how to interpret the 
>>>>>>>>>>>>>>>>> data.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Or am I missing something here?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Also, it appears the "record-content" is in the old CSV 
>>>>>>>>>>>>>>>>> format, regardless of my having selected the new binary 
>>>>>>>>>>>>>>>>> serialization 
>>>>>>>>>>>>>>>>> format? Does the REQUEST_RECORD_LOAD command not support the 
>>>>>>>>>>>>>>>>> new binary 
>>>>>>>>>>>>>>>>> serialization format? Is it not supported everywhere yet?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  I really do not want a client that has to load and then 
>>>>>>>>>>>>>>>>> parse in two stages - this adds considerable complexity, 
>>>>>>>>>>>>>>>>> run-time overhead, 
>>>>>>>>>>>>>>>>> and duplicates everything in-memory while loading. I'm 
>>>>>>>>>>>>>>>>> probably doing 
>>>>>>>>>>>>>>>>> something wrong or missing something obvious?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>       -- 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --- 
>>>>>>>>>>>>>>> You received this message because you are subscribed to a 
>>>>>>>>>>>>>>> topic in the Google Groups "OrientDB" group.
>>>>>>>>>>>>>>> To unsubscribe from this topic, visit 
>>>>>>>>>>>>>>> https://groups.google.com/d/topic/orient-database/9CKEun_Wrr
>>>>>>>>>>>>>>> A/unsubscribe.
>>>>>>>>>>>>>>> To unsubscribe from this group and all its topics, send an 
>>>>>>>>>>>>>>> email to [email protected].
>>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>  -- 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --- 
>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>> Google Groups "OrientDB" group.
>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  -- 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --- 
>>>>>>>>>>>>>> You received this message because you are subscribed to a 
>>>>>>>>>>>>>> topic in the Google Groups "OrientDB" group.
>>>>>>>>>>>>>> To unsubscribe from this topic, visit 
>>>>>>>>>>>>>> https://groups.google.com/d/topic/orient-database/9CKEun_Wrr
>>>>>>>>>>>>>> A/unsubscribe.
>>>>>>>>>>>>>> To unsubscribe from this group and all its topics, send an 
>>>>>>>>>>>>>> email to [email protected].
>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  -- 
>>>>>>>>>>>>
>>>>>>>>>>>> --- 
>>>>>>>>>>>> You received this message because you are subscribed to a topic 
>>>>>>>>>>>> in the Google Groups "OrientDB" group.
>>>>>>>>>>>> To unsubscribe from this topic, visit 
>>>>>>>>>>>> https://groups.google.com/d/topic/orient-database/9CKEun_Wrr
>>>>>>>>>>>> A/unsubscribe.
>>>>>>>>>>>> To unsubscribe from this group and all its topics, send an 
>>>>>>>>>>>> email to [email protected].
>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>
>>>>>>>>>>>  -- 
>>>>>>>>>>
>>>>>>>>>> --- 
>>>>>>>>>> You received this message because you are subscribed to a topic 
>>>>>>>>>> in the Google Groups "OrientDB" group.
>>>>>>>>>> To unsubscribe from this topic, visit 
>>>>>>>>>> https://groups.google.com/d/topic/orient-database/9CKEun_Wrr
>>>>>>>>>> A/unsubscribe.
>>>>>>>>>> To unsubscribe from this group and all its topics, send an email 
>>>>>>>>>> to [email protected].
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>  -- 
>>>>>>
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "OrientDB" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>>>
>>>>> --- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "OrientDB" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>> topic/orient-database/9CKEun_WrrA/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>  -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "OrientDB" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/orient-database/9CKEun_WrrA/unsubscribe
>>> .
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected] <javascript:>.
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>  

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: REQUEST_RECORD_LOAD - sequential client implementation not possible?

Reply via email to