Hi, Absolutely yes! Emanuele is in charge of this. We already have the first version working in 2.0-SNAPSHOT, but we're still working to improve the used space.
Emanuele can be more specific, I think the first public beta of this feature could be next week. Lvc@ On 6 August 2014 20:29, Stefán <[email protected]> wrote: > Hi guys, > > Have you been able to make some progress on this? > > Anxiously awaiting :) > > Best regards, > -Stefan > > > On Thursday, 15 May 2014 09:05:30 UTC, Steve Coughlan wrote: > >> >maybe we could use UTF-8/16 as charset as super set of all charsets? >> >> Which raises the question... Is it safe to assume that UTF-8 IS a >> superset of all charsets? My lack of charset expertise showing through >> here ;) >> >> >> On 15/05/14 19:02, Luca Garulli wrote: >> >> On 15 May 2014 10:00, Steve <[email protected]> wrote: >> >> Is there a way to access this programatically (without having to so a db >> query every time)? >> >> >> you can get it by: >> >> String charset = db.getStorage().getConfiguration().getCharset() >> >> I found OBinarySerializer.bytesToString() and stringToBytes() which >> appears to use single byte encoding for characters where it's possible. I >> think (but I can't say for certain) that this will result in a charset >> agnostic encoding of each char. >> >> The other option (the way I normally do this) is to use >> String.getBytes(charset). Which we could do if there is a global DB >> charset setting however we would run into an issue where if the charset was >> changed we may have to rewrite every string in the database? >> >> >> You're right, maybe we could use UTF-8/16 as charset as super set of >> all charsets? >> >> Lvc@ >> >> >> >> >> >> On 15/05/14 17:32, Luca Garulli wrote: >> >> Hi Steve, >> OrientDB already has a charset setting at database level, to change it: >> >> alter database charset utf-8 >> >> Maybe we could treat char like you did with integer: save the bits if >> the content doesn't use 2 bytes. >> >> Lvc@ >> >> On 15 May 2014 04:17, Steve <[email protected]> wrote: >> >> I'm just adapting the existing binary field serializers to a modified >> interface and looking at the existing OStringSerializer. I notice it >> serializes char by char (i.e. 2 bytes per char). Given that under most >> charsets the vast majority of text represented as a single byte I wonder if >> we could handle this safely using String.getBytes(charset). >> >> The question is, is there a charset that is a superset of all charsets. >> i.e. can we guarantee that the process of serialize/deserialize will never >> lose or alter data. I'm not really an expert on charsets so I thought I'd >> throw this one out there for input. >> >> We could specify a charset per cluster or per DB in the way that mysql >> does. It would be a pain for the user to have to be specifying charsets by >> default. But if the user is charset aware then we can neatly sidestep this >> issue. >> >> Any ideas on the best way to handle this? It would be a shame to double >> the storage size of every string in the DB if it's not necessary. >> >> On 15/05/14 01:22, Luca Garulli wrote: >> >> Hi Steve, >> I guessed you were super busy, no problem about it. Binary Protocol will >> be the first thing Emanuele will work on starting from the end of May. Very >> soon he'll contact you to have some information about last version you >> pushed. He'll help you to integrate your implementation inside OrientDB to >> let all the test cases to pass (thousands). >> >> Thanks, >> Lvc@ >> >> >> >> On 14 May 2014 13:26, Steve <[email protected]> wrote: >> >> If I read his last email on the subject correctly he already has. >> >> Again sorry to Luca for not responding, I missed the email when he sent >> it. >> >> >> >> On 14/05/14 21:19, [email protected] wrote: >> >> Hi, >> >> This is good news, now lets hope Luca can find resources for this soon. >> >> Regards, >> -Stefán >> >> On Wednesday, 14 May 2014 11:10:55 UTC, Steve Coughlan wrote: >> >> Hi Stefan, >> >> Progress has been slow although as I ran into the usual issue, got bogged >> down in issues, became obsessed, ended up spending far more time than I >> expected, got it the shit from my employer for neglecting my work, panicked >> to catch up, never got back to it ;) >> >> However I did push an update a couple of days ago. Although many of the >> extra's have not been addressed I'm now able to persist a binary record >> inside orientdb in and retrieve it after a restart (proving that it's >> deserialized from disk not from cache). Which implies also being able to >> persist the drstically altered schema structure. >> >> Since I had made the field-level serializer pluggable I've been a >> jackson-json as the serialization mechanism for easy debugging. Now I need >> to adjust the existing ODB binary serializers. They all embed data-length >> in the serialized data, which we don't need to do since we store it in >> headers. And I've adjusted the interface slightly. So I just need to >> massage the existing binary serializers a little to fit the new interface >> and we will be back to full binary serialization. >> >> So... some progress, no where near as much as I'd hoped but now that it >> actually works inside ODB (before we could only serialize/deserialize to >> byte arrays using dummy schema objects) I believe it's at a point where we >> can get other ODB developers involved to review/test/contribute. >> >> I've just noticed a post Luca made a while back that I missed that he'd >> employed someone who'll be focussed on this so I hope we can work together >> on the rest of the integration. Honestly integration has been the hardest >> part. I've learned an awful lot about the internals of ODB the hard way >> (apologies for blunt comment but the documentation is awful and it's very >> hard to distinguish what is internal/public API) and also learned I've >> probably only touched a tiny fraction of it. >> >> >> On 14/05/14 19:40, [email protected] wrote: >> >> Hi, >> >> Has something newsworthy happened on this? :) >> >> Best regards, >> -Stefán >> >> >> On Friday, 18 April 2014 13:57:07 UTC, Lvc@ wrote: >> >> >> Slightly different issue I think. I wasn't clear I was actually talking >> versioning of individual class schemas rather than global schema version. >> This is the part that allows to modify schema and (in some cases) avoid >> having to scan/rewrite all records in the class. Although this is a nice >> feature to have it's really quite a seperate problem from binary >> serialization so I decided to treat them as seperate issues since trying to >> deal with both at once was really bogging me down. Looking at your issue >> though I'd note that my subsclasses of OClassImpl and OPropertyImpl are >> actually immutable once constructed so this might help the schema-wide >> immutability. >> >> >> Good, this would simplify that issue. >> >> >> Also realised that per record compression will be rather easy to >> do... But that's in the extras bucket so will leave that as a bonus prize >> once the core functions are sorted and stable. >> >> >> We already have per record compression, what do you mean? >> >> >> I wasn't aware of this. Perhaps this occurs in the Raw database layer >> of the code? I haven't come across any compression code. If you already >> have per record compression does this negate any potential value to per >> field compression? i.e. if (string.length > 1000) compressString() >> >> >> We compress at storage level, but always, not with a threshold. This >> brings to no compression benefits in case of small records, so compression >> at marshalling time would be preferable: drivers could send compressed >> records to improve network I/O. >> >> Lvc@ >> >> >> >> </d >> >> ... > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
