Is there a way to access this programatically (without having to so a db query every time)?
I found OBinarySerializer.bytesToString() and stringToBytes() which appears to use single byte encoding for characters where it's possible. I think (but I can't say for certain) that this will result in a charset agnostic encoding of each char. The other option (the way I normally do this) is to use String.getBytes(charset). Which we could do if there is a global DB charset setting however we would run into an issue where if the charset was changed we may have to rewrite every string in the database? On 15/05/14 17:32, Luca Garulli wrote: > Hi Steve, > OrientDB already has a charset setting at database level, to change it: > > alter database charset utf-8 > > Maybe we could treat char like you did with integer: save the bits if > the content doesn't use 2 bytes. > > Lvc@ > > On 15 May 2014 04:17, Steve <[email protected] > <mailto:[email protected]>> wrote: > > I'm just adapting the existing binary field serializers to a > modified interface and looking at the existing OStringSerializer. > I notice it serializes char by char (i.e. 2 bytes per char). > Given that under most charsets the vast majority of text > represented as a single byte I wonder if we could handle this > safely using String.getBytes(charset). > > The question is, is there a charset that is a superset of all > charsets. i.e. can we guarantee that the process of > serialize/deserialize will never lose or alter data. I'm not > really an expert on charsets so I thought I'd throw this one out > there for input. > > We could specify a charset per cluster or per DB in the way that > mysql does. It would be a pain for the user to have to be > specifying charsets by default. But if the user is charset aware > then we can neatly sidestep this issue. > > Any ideas on the best way to handle this? It would be a shame to > double the storage size of every string in the DB if it's not > necessary. > > On 15/05/14 01:22, Luca Garulli wrote: >> Hi Steve, >> I guessed you were super busy, no problem about it. Binary >> Protocol will be the first thing Emanuele will work on starting >> from the end of May. Very soon he'll contact you to have some >> information about last version you pushed. He'll help you to >> integrate your implementation inside OrientDB to let all the test >> cases to pass (thousands). >> >> Thanks, >> Lvc@ >> >> >> >> On 14 May 2014 13:26, Steve <[email protected] >> <mailto:[email protected]>> wrote: >> >> If I read his last email on the subject correctly he already has. >> >> Again sorry to Luca for not responding, I missed the email >> when he sent it. >> >> >> >> On 14/05/14 21:19, [email protected] >> <mailto:[email protected]> wrote: >>> Hi, >>> >>> This is good news, now lets hope Luca can find resources for >>> this soon. >>> >>> Regards, >>> -Stefán >>> >>> On Wednesday, 14 May 2014 11:10:55 UTC, Steve Coughlan wrote: >>> >>> Hi Stefan, >>> >>> Progress has been slow although as I ran into the usual >>> issue, got bogged down in issues, became obsessed, ended >>> up spending far more time than I expected, got it the >>> shit from my employer for neglecting my work, panicked >>> to catch up, never got back to it ;) >>> >>> However I did push an update a couple of days ago. >>> Although many of the extra's have not been addressed I'm >>> now able to persist a binary record inside orientdb in >>> and retrieve it after a restart (proving that it's >>> deserialized from disk not from cache). Which implies >>> also being able to persist the drstically altered schema >>> structure. >>> >>> Since I had made the field-level serializer pluggable >>> I've been a jackson-json as the serialization mechanism >>> for easy debugging. Now I need to adjust the existing >>> ODB binary serializers. They all embed data-length in >>> the serialized data, which we don't need to do since we >>> store it in headers. And I've adjusted the interface >>> slightly. So I just need to massage the existing binary >>> serializers a little to fit the new interface and we >>> will be back to full binary serialization. >>> >>> So... some progress, no where near as much as I'd hoped >>> but now that it actually works inside ODB (before we >>> could only serialize/deserialize to byte arrays using >>> dummy schema objects) I believe it's at a point where we >>> can get other ODB developers involved to >>> review/test/contribute. >>> >>> I've just noticed a post Luca made a while back that I >>> missed that he'd employed someone who'll be focussed on >>> this so I hope we can work together on the rest of the >>> integration. Honestly integration has been the hardest >>> part. I've learned an awful lot about the internals of >>> ODB the hard way (apologies for blunt comment but the >>> documentation is awful and it's very hard to distinguish >>> what is internal/public API) and also learned I've >>> probably only touched a tiny fraction of it. >>> >>> >>> On 14/05/14 19:40, [email protected] wrote: >>>> Hi, >>>> >>>> Has something newsworthy happened on this? :) >>>> >>>> Best regards, >>>> -Stefán >>>> >>>> >>>> On Friday, 18 April 2014 13:57:07 UTC, Lvc@ wrote: >>>> >>>> >>>> Slightly different issue I think. I wasn't >>>> clear I was actually talking versioning of >>>> individual class schemas rather than global >>>> schema version. This is the part that allows >>>> to modify schema and (in some cases) avoid >>>> having to scan/rewrite all records in the >>>> class. Although this is a nice feature to have >>>> it's really quite a seperate problem from >>>> binary serialization so I decided to treat them >>>> as seperate issues since trying to deal with >>>> both at once was really bogging me down. >>>> Looking at your issue though I'd note that my >>>> subsclasses of OClassImpl and OPropertyImpl are >>>> actually immutable once constructed so this >>>> might help the schema-wide immutability. >>>> >>>> >>>> Good, this would simplify that issue. >>>> >>>> >>>>> Also realised that per record compression >>>>> will be rather easy to do... But that's in >>>>> the extras bucket so will leave that as a >>>>> bonus prize once the core functions are >>>>> sorted and stable. >>>>> >>>>> >>>>> We already have per record compression, what >>>>> do you mean? >>>> >>>> I wasn't aware of this. Perhaps this occurs in >>>> the Raw database layer of the code? I haven't >>>> come across any compression code. If you >>>> already have per record compression does this >>>> negate any potential value to per field >>>> compression? i.e. if (string.length > 1000) >>>> compressString() >>>> >>>> >>>> We compress at storage level, but always, not with >>>> a threshold. This brings to no compression benefits >>>> in case of small records, so compression at >>>> marshalling time would be preferable: drivers could >>>> send compressed records to improve network I/O. >>>> >>>> Lvc@ >>>> >>>> >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to >>>> the Google Groups "OrientDB" group. >>>> To unsubscribe from this group and stop receiving >>>> emails from it, send an email to >>>> [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the >>> Google Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails >>> from it, send an email to >>> [email protected] >>> <mailto:[email protected]>. >>> For more options, visit https://groups.google.com/d/optout. >> >> -- >> >> --- >> You received this message because you are subscribed to the >> Google Groups "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from >> it, send an email to >> [email protected] >> <mailto:[email protected]>. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> >> --- >> You received this message because you are subscribed to the >> Google Groups "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, >> send an email to [email protected] >> <mailto:[email protected]>. >> For more options, visit https://groups.google.com/d/optout. > > -- > > --- > You received this message because you are subscribed to the Google > Groups "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout. > > > -- > > --- > You received this message because you are subscribed to the Google > Groups "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
