Hi Steve, OrientDB already has a charset setting at database level, to change it:
alter database charset utf-8 Maybe we could treat char like you did with integer: save the bits if the content doesn't use 2 bytes. Lvc@ On 15 May 2014 04:17, Steve <[email protected]> wrote: > I'm just adapting the existing binary field serializers to a modified > interface and looking at the existing OStringSerializer. I notice it > serializes char by char (i.e. 2 bytes per char). Given that under most > charsets the vast majority of text represented as a single byte I wonder if > we could handle this safely using String.getBytes(charset). > > The question is, is there a charset that is a superset of all charsets. > i.e. can we guarantee that the process of serialize/deserialize will never > lose or alter data. I'm not really an expert on charsets so I thought I'd > throw this one out there for input. > > We could specify a charset per cluster or per DB in the way that mysql > does. It would be a pain for the user to have to be specifying charsets by > default. But if the user is charset aware then we can neatly sidestep this > issue. > > Any ideas on the best way to handle this? It would be a shame to double > the storage size of every string in the DB if it's not necessary. > > On 15/05/14 01:22, Luca Garulli wrote: > > Hi Steve, > I guessed you were super busy, no problem about it. Binary Protocol will > be the first thing Emanuele will work on starting from the end of May. Very > soon he'll contact you to have some information about last version you > pushed. He'll help you to integrate your implementation inside OrientDB to > let all the test cases to pass (thousands). > > Thanks, > Lvc@ > > > > On 14 May 2014 13:26, Steve <[email protected]> wrote: > >> If I read his last email on the subject correctly he already has. >> >> Again sorry to Luca for not responding, I missed the email when he sent >> it. >> >> >> >> On 14/05/14 21:19, [email protected] wrote: >> >> Hi, >> >> This is good news, now lets hope Luca can find resources for this soon. >> >> Regards, >> -Stefán >> >> On Wednesday, 14 May 2014 11:10:55 UTC, Steve Coughlan wrote: >>> >>> Hi Stefan, >>> >>> Progress has been slow although as I ran into the usual issue, got >>> bogged down in issues, became obsessed, ended up spending far more time >>> than I expected, got it the shit from my employer for neglecting my work, >>> panicked to catch up, never got back to it ;) >>> >>> However I did push an update a couple of days ago. Although many of the >>> extra's have not been addressed I'm now able to persist a binary record >>> inside orientdb in and retrieve it after a restart (proving that it's >>> deserialized from disk not from cache). Which implies also being able to >>> persist the drstically altered schema structure. >>> >>> Since I had made the field-level serializer pluggable I've been a >>> jackson-json as the serialization mechanism for easy debugging. Now I need >>> to adjust the existing ODB binary serializers. They all embed data-length >>> in the serialized data, which we don't need to do since we store it in >>> headers. And I've adjusted the interface slightly. So I just need to >>> massage the existing binary serializers a little to fit the new interface >>> and we will be back to full binary serialization. >>> >>> So... some progress, no where near as much as I'd hoped but now that it >>> actually works inside ODB (before we could only serialize/deserialize to >>> byte arrays using dummy schema objects) I believe it's at a point where we >>> can get other ODB developers involved to review/test/contribute. >>> >>> I've just noticed a post Luca made a while back that I missed that he'd >>> employed someone who'll be focussed on this so I hope we can work together >>> on the rest of the integration. Honestly integration has been the hardest >>> part. I've learned an awful lot about the internals of ODB the hard way >>> (apologies for blunt comment but the documentation is awful and it's very >>> hard to distinguish what is internal/public API) and also learned I've >>> probably only touched a tiny fraction of it. >>> >>> >>> On 14/05/14 19:40, [email protected] wrote: >>> >>> Hi, >>> >>> Has something newsworthy happened on this? :) >>> >>> Best regards, >>> -Stefán >>> >>> >>> On Friday, 18 April 2014 13:57:07 UTC, Lvc@ wrote: >>>> >>>> >>>>> Slightly different issue I think. I wasn't clear I was actually >>>>> talking versioning of individual class schemas rather than global schema >>>>> version. This is the part that allows to modify schema and (in some >>>>> cases) >>>>> avoid having to scan/rewrite all records in the class. Although this is a >>>>> nice feature to have it's really quite a seperate problem from binary >>>>> serialization so I decided to treat them as seperate issues since trying >>>>> to >>>>> deal with both at once was really bogging me down. Looking at your issue >>>>> though I'd note that my subsclasses of OClassImpl and OPropertyImpl are >>>>> actually immutable once constructed so this might help the schema-wide >>>>> immutability. >>>>> >>>> >>>> Good, this would simplify that issue. >>>> >>>> >>>>> Also realised that per record compression will be rather easy to >>>>>> do... But that's in the extras bucket so will leave that as a bonus prize >>>>>> once the core functions are sorted and stable. >>>>>> >>>>> >>>>> We already have per record compression, what do you mean? >>>>> >>>>> >>>>> I wasn't aware of this. Perhaps this occurs in the Raw database >>>>> layer of the code? I haven't come across any compression code. If you >>>>> already have per record compression does this negate any potential value >>>>> to >>>>> per field compression? i.e. if (string.length > 1000) compressString() >>>>> >>>> >>>> We compress at storage level, but always, not with a threshold. This >>>> brings to no compression benefits in case of small records, so compression >>>> at marshalling time would be preferable: drivers could send compressed >>>> records to improve network I/O. >>>> >>>> Lvc@ >>>> >>>> >>>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
