On 15 May 2014 10:00, Steve <[email protected]> wrote: > Is there a way to access this programatically (without having to so a db > query every time)? >
you can get it by: String charset = db.getStorage().getConfiguration().getCharset() I found OBinarySerializer.bytesToString() and stringToBytes() which appears > to use single byte encoding for characters where it's possible. I think > (but I can't say for certain) that this will result in a charset agnostic > encoding of each char. > > The other option (the way I normally do this) is to use > String.getBytes(charset). Which we could do if there is a global DB > charset setting however we would run into an issue where if the charset was > changed we may have to rewrite every string in the database? > You're right, maybe we could use UTF-8/16 as charset as super set of all charsets? Lvc@ > > > On 15/05/14 17:32, Luca Garulli wrote: > > Hi Steve, > OrientDB already has a charset setting at database level, to change it: > > alter database charset utf-8 > > Maybe we could treat char like you did with integer: save the bits if > the content doesn't use 2 bytes. > > Lvc@ > > On 15 May 2014 04:17, Steve <[email protected]> wrote: > >> I'm just adapting the existing binary field serializers to a modified >> interface and looking at the existing OStringSerializer. I notice it >> serializes char by char (i.e. 2 bytes per char). Given that under most >> charsets the vast majority of text represented as a single byte I wonder if >> we could handle this safely using String.getBytes(charset). >> >> The question is, is there a charset that is a superset of all charsets. >> i.e. can we guarantee that the process of serialize/deserialize will never >> lose or alter data. I'm not really an expert on charsets so I thought I'd >> throw this one out there for input. >> >> We could specify a charset per cluster or per DB in the way that mysql >> does. It would be a pain for the user to have to be specifying charsets by >> default. But if the user is charset aware then we can neatly sidestep this >> issue. >> >> Any ideas on the best way to handle this? It would be a shame to double >> the storage size of every string in the DB if it's not necessary. >> >> On 15/05/14 01:22, Luca Garulli wrote: >> >> Hi Steve, >> I guessed you were super busy, no problem about it. Binary Protocol will >> be the first thing Emanuele will work on starting from the end of May. Very >> soon he'll contact you to have some information about last version you >> pushed. He'll help you to integrate your implementation inside OrientDB to >> let all the test cases to pass (thousands). >> >> Thanks, >> Lvc@ >> >> >> >> On 14 May 2014 13:26, Steve <[email protected]> wrote: >> >>> If I read his last email on the subject correctly he already has. >>> >>> Again sorry to Luca for not responding, I missed the email when he sent >>> it. >>> >>> >>> >>> On 14/05/14 21:19, [email protected] wrote: >>> >>> Hi, >>> >>> This is good news, now lets hope Luca can find resources for this soon. >>> >>> Regards, >>> -Stefán >>> >>> On Wednesday, 14 May 2014 11:10:55 UTC, Steve Coughlan wrote: >>>> >>>> Hi Stefan, >>>> >>>> Progress has been slow although as I ran into the usual issue, got >>>> bogged down in issues, became obsessed, ended up spending far more time >>>> than I expected, got it the shit from my employer for neglecting my work, >>>> panicked to catch up, never got back to it ;) >>>> >>>> However I did push an update a couple of days ago. Although many of >>>> the extra's have not been addressed I'm now able to persist a binary record >>>> inside orientdb in and retrieve it after a restart (proving that it's >>>> deserialized from disk not from cache). Which implies also being able to >>>> persist the drstically altered schema structure. >>>> >>>> Since I had made the field-level serializer pluggable I've been a >>>> jackson-json as the serialization mechanism for easy debugging. Now I need >>>> to adjust the existing ODB binary serializers. They all embed data-length >>>> in the serialized data, which we don't need to do since we store it in >>>> headers. And I've adjusted the interface slightly. So I just need to >>>> massage the existing binary serializers a little to fit the new interface >>>> and we will be back to full binary serialization. >>>> >>>> So... some progress, no where near as much as I'd hoped but now that it >>>> actually works inside ODB (before we could only serialize/deserialize to >>>> byte arrays using dummy schema objects) I believe it's at a point where we >>>> can get other ODB developers involved to review/test/contribute. >>>> >>>> I've just noticed a post Luca made a while back that I missed that he'd >>>> employed someone who'll be focussed on this so I hope we can work together >>>> on the rest of the integration. Honestly integration has been the hardest >>>> part. I've learned an awful lot about the internals of ODB the hard way >>>> (apologies for blunt comment but the documentation is awful and it's very >>>> hard to distinguish what is internal/public API) and also learned I've >>>> probably only touched a tiny fraction of it. >>>> >>>> >>>> On 14/05/14 19:40, [email protected] wrote: >>>> >>>> Hi, >>>> >>>> Has something newsworthy happened on this? :) >>>> >>>> Best regards, >>>> -Stefán >>>> >>>> >>>> On Friday, 18 April 2014 13:57:07 UTC, Lvc@ wrote: >>>>> >>>>> >>>>>> Slightly different issue I think. I wasn't clear I was actually >>>>>> talking versioning of individual class schemas rather than global schema >>>>>> version. This is the part that allows to modify schema and (in some >>>>>> cases) >>>>>> avoid having to scan/rewrite all records in the class. Although this is >>>>>> a >>>>>> nice feature to have it's really quite a seperate problem from binary >>>>>> serialization so I decided to treat them as seperate issues since trying >>>>>> to >>>>>> deal with both at once was really bogging me down. Looking at your >>>>>> issue >>>>>> though I'd note that my subsclasses of OClassImpl and OPropertyImpl are >>>>>> actually immutable once constructed so this might help the schema-wide >>>>>> immutability. >>>>>> >>>>> >>>>> Good, this would simplify that issue. >>>>> >>>>> >>>>>> Also realised that per record compression will be rather easy to >>>>>>> do... But that's in the extras bucket so will leave that as a bonus >>>>>>> prize >>>>>>> once the core functions are sorted and stable. >>>>>>> >>>>>> >>>>>> We already have per record compression, what do you mean? >>>>>> >>>>>> >>>>>> I wasn't aware of this. Perhaps this occurs in the Raw database >>>>>> layer of the code? I haven't come across any compression code. If you >>>>>> already have per record compression does this negate any potential value >>>>>> to >>>>>> per field compression? i.e. if (string.length > 1000) compressString() >>>>>> >>>>> >>>>> We compress at storage level, but always, not with a threshold. This >>>>> brings to no compression benefits in case of small records, so compression >>>>> at marshalling time would be preferable: drivers could send compressed >>>>> records to improve network I/O. >>>>> >>>>> Lvc@ >>>>> >>>>> >>>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "OrientDB" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
