Re: [orientdb] Schema Driven Binary Serialization - Strings

Luca Garulli Thu, 15 May 2014 00:33:08 -0700

Hi Steve,
OrientDB already has a charset setting at database level, to change it:


alter database charset utf-8

Maybe we could treat char like you did with integer: save the bits if the
content doesn't use 2 bytes.

Lvc@

On 15 May 2014 04:17, Steve <[email protected]> wrote:

>  I'm just adapting the existing binary field serializers to a modified
> interface and looking at the existing OStringSerializer.  I notice it
> serializes char by char (i.e. 2 bytes per char).  Given that under most
> charsets the vast majority of text represented as a single byte I wonder if
> we could handle this safely using String.getBytes(charset).
>
> The question is, is there a charset that is a superset of all charsets.
> i.e. can we guarantee that the process of serialize/deserialize will never
> lose or alter data.  I'm not really an expert on charsets so I thought I'd
> throw this one out there for input.
>
> We could specify a charset per cluster or per DB in the way that mysql
> does.  It would be a pain for the user to have to be specifying charsets by
> default.  But if the user is charset aware then we can neatly sidestep this
> issue.
>
> Any ideas on the best way to handle this?  It would be a shame to double
> the storage size of every string in the DB if it's not necessary.
>
> On 15/05/14 01:22, Luca Garulli wrote:
>
> Hi Steve,
> I guessed you were super busy, no problem about it. Binary Protocol will
> be the first thing Emanuele will work on starting from the end of May. Very
> soon he'll contact you to have some information about last version you
> pushed. He'll help you to integrate your implementation inside OrientDB to
> let all the test cases to pass (thousands).
>
>  Thanks,
> Lvc@
>
>
>
> On 14 May 2014 13:26, Steve <[email protected]> wrote:
>
>>  If I read his last email on the subject correctly he already has.
>>
>> Again sorry to Luca for not responding, I missed the email when he sent
>> it.
>>
>>
>>
>> On 14/05/14 21:19, [email protected] wrote:
>>
>> Hi,
>>
>>  This is good news, now lets hope Luca can find resources for this soon.
>>
>>  Regards,
>>  -Stefán
>>
>> On Wednesday, 14 May 2014 11:10:55 UTC, Steve Coughlan wrote:
>>>
>>>  Hi Stefan,
>>>
>>> Progress has been slow although as I ran into the usual issue, got
>>> bogged down in issues, became obsessed, ended up spending far more time
>>> than I expected, got it the shit from my employer for neglecting my work,
>>> panicked to catch up, never got back to it ;)
>>>
>>> However I did push an update a couple of days ago.  Although many of the
>>> extra's have not been addressed I'm now able to persist a binary record
>>> inside orientdb in and retrieve it after a restart (proving that it's
>>> deserialized from disk not from cache).  Which implies also being able to
>>> persist the drstically altered schema structure.
>>>
>>> Since I had made the field-level serializer pluggable I've been a
>>> jackson-json as the serialization mechanism for easy debugging.  Now I need
>>> to adjust the existing ODB binary serializers.  They all embed data-length
>>> in the serialized data, which we don't need to do since we store it in
>>> headers.  And I've adjusted the interface slightly.  So I just need to
>>> massage the existing binary serializers a little to fit the new interface
>>> and we will be back to full binary serialization.
>>>
>>> So... some progress, no where near as much as I'd hoped but now that it
>>> actually works inside ODB (before we could only serialize/deserialize to
>>> byte arrays using dummy schema objects) I believe it's at a point where we
>>> can get other ODB developers involved to review/test/contribute.
>>>
>>> I've just noticed a post Luca made a while back that I missed that he'd
>>> employed someone who'll be focussed on this so I hope we can work together
>>> on the rest of the integration.  Honestly integration has been the hardest
>>> part.  I've learned an awful lot about the internals of ODB the hard way
>>> (apologies for blunt comment but the documentation is awful and it's very
>>> hard to distinguish what is internal/public API) and also learned I've
>>> probably only touched a tiny fraction of it.
>>>
>>>
>>> On 14/05/14 19:40, [email protected] wrote:
>>>
>>> Hi,
>>>
>>>  Has something newsworthy happened on this?  :)
>>>
>>>  Best regards,
>>>   -Stefán
>>>
>>>
>>> On Friday, 18 April 2014 13:57:07 UTC, Lvc@ wrote:
>>>>
>>>>
>>>>>  Slightly different issue I think.  I wasn't clear I was actually
>>>>> talking versioning of individual class schemas rather than global schema
>>>>> version.  This is the part that allows to modify schema and (in some 
>>>>> cases)
>>>>> avoid having to scan/rewrite all records in the class.  Although this is a
>>>>> nice feature to have it's really quite a seperate problem from binary
>>>>> serialization so I decided to treat them as seperate issues since trying 
>>>>> to
>>>>> deal with both at once was really bogging me down.   Looking at your issue
>>>>> though I'd note that my subsclasses of OClassImpl and OPropertyImpl are
>>>>> actually immutable once constructed so this might help the schema-wide
>>>>> immutability.
>>>>>
>>>>
>>>>  Good, this would simplify that issue.
>>>>
>>>>
>>>>>     Also realised that per record compression will be rather easy to
>>>>>> do... But that's in the extras bucket so will leave that as a bonus prize
>>>>>> once the core functions are sorted and stable.
>>>>>>
>>>>>
>>>>>  We already have per record compression, what do you mean?
>>>>>
>>>>>
>>>>>  I wasn't aware of this.  Perhaps this occurs in the Raw database
>>>>> layer of the code?  I haven't come across any compression code.  If you
>>>>> already have per record compression does this negate any potential value 
>>>>> to
>>>>> per field compression?  i.e. if (string.length > 1000) compressString()
>>>>>
>>>>
>>>>  We compress at storage level, but always, not with a threshold. This
>>>> brings to no compression benefits in case of small records, so compression
>>>> at marshalling time would be preferable: drivers could send compressed
>>>> records to improve network I/O.
>>>>
>>>>  Lvc@
>>>>
>>>>
>>>>
>>>  --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>   --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>    --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Schema Driven Binary Serialization - Strings

Reply via email to