Not much details there, sorry, these "shapes" just caught my attention.
On Friday, 21 March 2014 21:55:28 UTC, [email protected] wrote: > > Hi, > > I came across this and found it interesting, perhaps you will too. > > https://www.arangodb.org/2012/07/11/infographic-comparing-space-usage-mongodb-couchdb-arangodb > > Regards, > -Stefán > > On Tuesday, 18 March 2014 06:58:08 UTC, Steve Coughlan wrote: >> >> Sure... I'll clean it up a bit over the next few days and document a bit >> more clearly. Then I'll push it. No sense trying to figure out the whole >> API by myself when there's people who know it inside out. Good thinking ;) >> >> On 18/03/14 15:28, Luca Garulli wrote: >> >> Hi Steve, >> we're still closing 1.7, so we have a few months for 2.0. What I'd like >> is a new branch where everybody can look, contribute and compare approaches. >> >> Could you push it on a branch in GitHub? WDYT? >> >> Lvc@ >> >> >> >> On 18 March 2014 06:21, Steve <[email protected]> wrote: >> >> Hi Luca, >> >> Apologies. I was able to make a PoC ok but I ran into lots of >> difficulties trying to integrate it into orient code (mainly to do with >> working out how not break binary compaitibility with the existing binary >> protocol). Then work went insanely busy for a while and I haven't got back >> to it yet. What sort of timelines are looking at for v2.0? I know you are >> keen to get something like this into that version and obviously if you use >> my work your devs will need plenty of time to review and tweak it. If you >> can tell me how long you have then I can give you an idea whether I think I >> can realistically deliver or not. I would like to do this but I don't want >> to leave you waiting for me if you have resources that could do it sooner. >> >> regards, >> >> Steve >> >> >> >> On 18/03/14 15:09, Luca Garulli wrote: >> >> Hi Steve, >> have you had the chance to play with this? Any updates? >> >> Lvc@ >> >> >> >> On 21 February 2014 19:01, Luca Garulli <[email protected]> wrote: >> >> On 20 February 2014 13:24, Steve <[email protected]> wrote: >> >> Hi Andrey, >> >> I forked orient-core today and spent most of the day playing around with >> the source trying to work out how to change over my pseudo schema, >> property, type classes into OSchema, OProperty, OType. >> ORecordSerializerDocument2Binary was very useful for understanding things. >> Is it actually in use? I can't find any references to it. >> >> >> AFAIK it's not used. It was just a prototype. >> >> >> Could you explain *"We have many third party drivers for binary >> protocol"* a bit more? Are there any examples? >> >> >> All the binary drivers manage directly the current serialization. The >> content is sent in binary for to the client and it has to unmarshall. To >> all the binary drivers implemented it. >> >> At the beginning we could marshall the content in old form when we send >> it to the clients, based on client protocol version. >> >> >> I also have a question about ORID and whether it can be considered fixed >> length. It contains OClusterPosition which has two implementations. One >> is 8 bytes long and the other is 24 bytes long. For the purposes of >> serialization we can't consider the ORID to be fixed length unless we >> guaruntee that every instance of ORID within a DB is only one of these >> implementations. Is this the case? >> >> >> Consider it as fixed length, the longer is not yet used. >> >> >> At the moment I'm also wrestling with what to do about null fixed length >> fields and whether to reserves space inside a record. Whilst headers are >> ordered by schema_fixed_length, schema_variable_length, schema_less fields >> there's no reason data needs to follow the same order. But by default it >> probably would. Consider an object schema like this: >> class SomeClass { >> update_time: DateTime //fixed length >> short_string: String >> massive_string: String >> } >> >> If we first write the record and update_time is null we'd have something >> like this >> update_time:0 bytes|short_string: 10 bytes|massive_string:100kbytes >> >> Then we update it to add update_time we have a few options. >> 1/ When originally writing the object reserve space even though the value >> is null (wasted space) >> 2/ Search for a hole. e.g. if short_string has been set to null we could >> steal it's space. >> 3/ Write the update_time field after massive_string (If there is space >> before the beginning of the next record). Potentially we are writing into a >> different disk block so for future reads when we aren't interested in >> massive_string we still have to load the block into memory) >> 4/ Rewrite the entire record. >> >> I suppose it is worth considering whether there's a benefit to reserving >> partial holes. i.e. if we have 10 * 4 byte nullable fixed length fields >> (all null on initial write) should we take a guess and reserve say 10 out >> of the 40 possible bytes for future updates? But I'm probably getting >> ahead of myself. I'll work on a simple implementation first before trying >> to be too clever ;) >> >> >> Good question. >> >> I think that reserving space for fixed length fields has the advantage >> to keep the fixed size area as is and fixed length fields are usually >> small, maximum 8 bytes each. >> >> By the way, datetime now are stored as long, so probably a -1 could >> means NULL. We should figure out how to represent NULL on each type. >> >> Lvc@ >> >> >> >> On 20/02/14 20:12, Andrey Lomakin wrote: >> >> Hi Steve, >> Good that you are going to help us. >> Few additional information: >> 1. We already have binary serialization support you can see it here >> com.orientechnologies.common.serialization.types.OBinarySerializer so >> obviously we should not have several version of the same. Also I think it >> will be interesting for you to look at this issue and discussion here >> https://github.com/orientechnologies/orientdb/issues/681#issuecomment-28466948. >> We discussed serialization of single record (sorry had no time to analyze >> it deeply because a lot of events) but in case of SQL query you have to >> process millions of them. >> 2. We are working on binary compatibility mechanics too (I mean >> compatibility between storage formats), without it current users will not >> be able to accomplish new features especially binary serialization. >> 3. We have many third party drivers for binary protocol (which pass >> serialized records on client;s side) so we have to think how to not break >> functionality of this drivers. >> >> >> >> On Wed, Feb 19, 2014 at 1:53 PM, Steve <[email protected]> wrote: >> >> Hi Luca, >> >> I'll give it a go with the real ODB code. The reason I didn't is because >> I'm actually quite new to ODB even as an end user but your instructions >> will set me in the right direction. Most of my experience with data >> serialization formats has been with Bitcoin which was mostly for network >> protocol use cases rather than big-data storage. But that was also a high >> performance scenario so I guess there are a lot of parallels. >> >> >> On 19/02/14 21:33, Luca Garulli wrote: >> >> Hi Steve, >> sorry for such delay. >> >> I like your ideas, I think this is the right direction. varint8 e >> varint16 could be a good way to save space, but we should consider when >> this slows down some use cases, like partial field loading. >> >> About the POC you created I think it would be much more useful if you >> play with real documents. It's easy and you could push it to a separate >> branch to let to us and other developers to contribute & test. WDYT? >> >> Follow these steps: >> >> (1) create your serializer >> >> This is the skeleton of the class to implement: >> >> public class BinaryDocumentSerializer implements ORecordSerializer { >> public static final String NAME = "binarydoc"; >> >> // UN-MARSHALLING >> public ORecordInternal<?> fromStream(final byte[] iSource) { >> } >> >> // PARTIAL UN-MARSHALLING >> public ORecordInternal<?> fromStream(final byte[] iSource, final >> ORecordInternal<?> iRecord, String[] iFields) { >> } >> >> // MARSHALLING >> public byte[] toStream(final ORecordInternal<?> iSource, boolean >> iOnlyDelta) { >> } >> } >> >> (2) register your implementation >> >> ORecordSerializerFactory.instance().register(BinaryDocumentSerializer.NAME, >> new BinaryDocumentSerializer()); >> >> (3) create a new ODocument subclass >> >> Then create a new class that extends ODocument but uses your >> implementation: >> >> public class BinaryDocument extends ODocument { >> protected void setup() { >> super.setup(); >> _recordFormat = >> ORecordSerializerFactory.instance().getFormat(BinaryDocumentSerializer.NAME); >> } >> } >> >> (4) Try it! >> >> And now try to create a BinaryDocument, set fields and call .save(). >> The method BinaryDocumentSerializer.toStream() will be called. >> >> ... > > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
