Hi, Do you have any numbers already on expected/estimated space saving?
Regards, -Stefán On Friday, August 8, 2014 9:23:14 AM UTC, [email protected] wrote: > > Great, thank you both! > > On Thursday, 7 August 2014 10:43:21 UTC, Emanuele wrote: > > > Hi, > Yes we have a good progress on this, the first step was to write a > schemaless binary serialization, and that is done (here > <https://github.com/orientechnologies/orientdb/wiki/Record-Schemaless-Binary-Serialization> > > the specs) > The second step was replace the field definition in the record(needed by > the schemaless) with the one declared in the schema. > The second step is working in progress now, you can check the status in > this issue: #1890 > <https://github.com/orientechnologies/orientdb/issues/1890> > > I will post here when will be done and the new serialization will be > enabled by default. > > On Wednesday, 6 August 2014 21:56:54 UTC+1, Lvc@ wrote: > > Hi, > Absolutely yes! Emanuele is in charge of this. We already have the first > version working in 2.0-SNAPSHOT, but we're still working to improve the > used space. > > Emanuele can be more specific, I think the first public beta of this > feature could be next week. > > Lvc@ > > > > On 6 August 2014 20:29, Stefán <[email protected]> wrote: > > Hi guys, > > Have you been able to make some progress on this? > > Anxiously awaiting :) > > Best regards, > -Stefan > > > On Thursday, 15 May 2014 09:05:30 UTC, Steve Coughlan wrote: > > >maybe we could use UTF-8/16 as charset as super set of all charsets? > > Which raises the question... Is it safe to assume that UTF-8 IS a superset > of all charsets? My lack of charset expertise showing through here ;) > > > On 15/05/14 19:02, Luca Garulli wrote: > > On 15 May 2014 10:00, Steve <[email protected]> wrote: > > Is there a way to access this programatically (without having to so a db > query every time)? > > > you can get it by: > > String charset = db.getStorage().getConfiguration().getCharset() > > I found OBinarySerializer.bytesToString() and stringToBytes() which > appears to use single byte encoding for characters where it's possible. I > think (but I can't say for certain) that this will result in a charset > agnostic encoding of each char. > > The other option (the way I normally do this) is to use > String.getBytes(charset). Which we could do if there is a global DB > charset setting however we would run into an issue where if the charset was > changed we may have to rewrite every string in the database? > > > You're right, maybe we could use UTF-8/16 as charset as super set of all > charsets? > > Lvc@ > > > > > > On 15/05/14 17:32, Luca Garulli wrote: > > Hi Steve, > OrientDB already has a charset setting at database level, to change it: > > alter database charset utf-8 > > Maybe we could treat char like you did with integer: save the bits if > the content doesn't use 2 bytes. > > Lvc@ > > On 15 May 2014 04:17, Steve <[email protected]> wrote: > > I'm just adapting the existing binary field serializers to a modified > interface and looking at the existing OStringSerializer. I notice it > serializes char by char (i.e. 2 bytes per char). Given that under most > charsets the vast majority of text represented as a single byte I wonder if > we could handle this safely using String.getBytes(charset). > > The question is, is there a charset that is a superset of all charsets. > i.e. can we guarantee that the process of serialize/deserialize will never > lose or alter data. I'm not really an expert on charsets so I thought I'd > throw this one out there for input. > > We could specify a charset per cluster or per DB in the way that mysql > does. It would be a pain for the user to have to be specifying charsets by > default. But if the user is charset aware then we can neatly sidestep this > issue. > > Any ideas on the best way to handle this? It would be a shame to double > the storage size of every string in the DB if it's not necessary. > > On 15/05/14 01:22, Luca Garulli wrote: > > Hi Steve, > I guessed you were super busy, no problem about it. Binary Protocol will > be the first thing Emanuele will work on starting from the end of May. Very > soon he'll contact you to have some information about last version you > pushed. He'll help you to integrate your implementation inside OrientDB to > let all the test cases to pass (thousands). > > Thanks, > Lvc@ > > > > On 14 May 2014 13:26, Steve <[email protected]> wrote: > > If I read his last email on the subject correctly he already has. > > Again sorry to Luca for not responding, I missed the email when he sent > it. > > > > On 14/05/14 21:19, [email protected] wrote: > > Hi, > > This is good news, now lets hope Luca can find resources for this soon. > > Regards, > -Stefán > > On Wednesday, 14 May 2014 11:10:55 UTC, Steve Coughlan wrote: > > Hi Stefan, > > Progress has been slow although as I ran into the usual issue, got bogged > down in issues, became obsessed, ended up spending far more time than I > expected, got it the shit from my employer for neglecting my work, panicked > to catch up, never got back to it ;) > > However I did push an update a couple of days ago. Although many of the > extra's have not been addressed I'm now able to persist a binary record > inside orientdb in and retrieve it after a restart (proving that it's > deserialized from disk not from cache). Which implies also being able to > persist the drstically altered schema structure. > > Since I had made the field-level serializer pluggable I've been a > jackson-json as the serialization mechanism for easy debugging. Now I need > to adjust the existing ODB binary serializers. They all embed data-length > in the serialized data, which we don't need to do since we store it in > headers. And I've adjusted the interface slightly. So I just need to > massage the existing binary serializers a little to fit the new interface > and we will be back to full binary serialization. > > So... some progress, no where near as much as I'd hoped but now that it > actually works inside ODB (before we could only serialize/deserialize to > byte arrays using dummy schema objects) I believe it's at a point where we > can get other ODB developers involved to review/test/contribute. > > I've just noticed a post Luca made a while back that I missed that he'd > employed someone who'll be focussed on this so I hope we can work together > on the rest of the integration. Honestly integration has been the hardest > part. I've learned an awful lot about the internals of ODB the hard way > (apologies for blunt comment but the documentation is awful and it's very > hard to distinguish what is internal/public API) and also learned I've > probably only touched a tiny fraction of it. > > > On 14/05/14 19:40, [email protected] wrote: > > Hi, > > Has something newsworthy happened on this? :) > > Best regards, > -Stefán > > > On Friday, 18 April 2014 13:57:07 UTC, Lvc@ wrote: > > > Slightly different issue I think. I wasn't clear I was actually talking > versioning of individual class schemas rather than global schema version. > This is the part that allows to modify schema and (in some cases) avoid > having to scan/rewrite all records in the class. Although this is a nice > feature to have it's really quite a seperate problem from binary > serialization so I decided to treat them as seperate issues since trying to > deal with both at once was really bogging me down. Looking at your issue > though I'd note that my subsclasses of OClassImpl and OPropertyImpl are > actually immutable once constructed so this might help the schema-wide > immutability. > > > Good, this would simplify that issue. > > > Also realised that per record compression will be rather easy to > do... But that's in the extras bucket so will leave that as a bonus prize > once the core functions are sorted and stable. > > > We already have per record compression, what do you mean? > > > I wasn't aware of this. Perhaps this occurs in the Raw database layer of > the code? I haven't come across any compression code. If you already have > per record compression does this negate any potential value to per field > compression? i.e. if (string.length > 1000) compressString() > > > We compress at storage level, but always, not with a threshold. This > brings to no compression benefits in case of small records, so compression > at marshalling time would be preferable: drivers could send compressed > records to improve network I/O. > > Lvc@ > > > > </d > > </blockq > > ... -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
