Re: How to best store arbitrarily large Java objects

2016-07-22 Thread Russell Brown

On 22 Jul 2016, at 09:12, Henning Verbeek  wrote:

> Alex,
> thanks for the very quick response.
> 
> On Thu, Jul 21, 2016 at 5:36 PM, Alex Moore  wrote:
>>> I'm beginning to think that I'll need to remodel my data and use CRDTs
>>> for individual fields such as the `TreeMap`. Would that be a better
>>> way?
>> 
>> 
>> This sounds like a plausible idea.  If you do a lot of possibly conflicting
>> updates to the Tree, then a CRDT map would be the way to go.  You could
>> reuse the key from the main object, and just put it in the new
>> buckettype/bucket.
> 
> Looking at the 
> [documentation](http://docs.basho.com/riak/kv/2.1.4/developing/data-types/maps/)
> I assume there are no limits to the amount of entries, right?

There is a size limit as a map is just a riak object like any other. We’re 
working on decomposed CRDTs, where the Set/Map/etc are split across many keys. 
We expect Sets are coming soon, Maps are a little further out.

> 
>> If you don't need to update the tree much, you could also just serialize the
>> tree into it's own object - split up the static data and the often updated
>> data, and put them in different buckets that share the same key.
> 
> The tree is built once and read often, rarely appended to. The problem
> with splitting up the object is that the tree makes up about 95% of
> the size, so unless I can split up the tree, it wont help much.

Splitting up CRDTs that are related is probably going to be a problem too, as 
they need to share some common causal information to merge correctly. See above.

> 
> Thanks again!
> Henning
> 
> PS: It'd be great to have a `Converter` that can be instructed to map
> fields to CRDT through annotations :)

Is there not a java converter that maps an object to a CRDT map already? That 
would seem like a nice thing to have, though you’d be limited in the types of 
your fields to sets/registers/booleans/counters/maps it should work nicely.

> 
> -- 
> My other signature is a regular expression.
> http://www.pray4snow.de
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: How to best store arbitrarily large Java objects

2016-07-22 Thread Henning Verbeek
Alex,
thanks for the very quick response.

On Thu, Jul 21, 2016 at 5:36 PM, Alex Moore  wrote:
>> I'm beginning to think that I'll need to remodel my data and use CRDTs
>> for individual fields such as the `TreeMap`. Would that be a better
>> way?
>
>
> This sounds like a plausible idea.  If you do a lot of possibly conflicting
> updates to the Tree, then a CRDT map would be the way to go.  You could
> reuse the key from the main object, and just put it in the new
> buckettype/bucket.

Looking at the 
[documentation](http://docs.basho.com/riak/kv/2.1.4/developing/data-types/maps/)
I assume there are no limits to the amount of entries, right?

> If you don't need to update the tree much, you could also just serialize the
> tree into it's own object - split up the static data and the often updated
> data, and put them in different buckets that share the same key.

The tree is built once and read often, rarely appended to. The problem
with splitting up the object is that the tree makes up about 95% of
the size, so unless I can split up the tree, it wont help much.

Thanks again!
Henning

PS: It'd be great to have a `Converter` that can be instructed to map
fields to CRDT through annotations :)

-- 
My other signature is a regular expression.
http://www.pray4snow.de

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: How to best store arbitrarily large Java objects

2016-07-21 Thread Alex Moore
Hi Henning,

Responses inline:

...

> However, depending on the size of the `TreeMap`, the serialization
> output can become rather large, and this limits the usefulness of my
> object. In our tests, dealing with Riak-objects >2MB proved to be
> significantly slower than dealing with objects <200kB.


Yes. We usually recommend  keeping objects < 100kB for the best
performance; and Riak can usually withstand objects up to 1MB with the
understanding that everything will be a little slower with the larger
objects going around the system.


> My idea was to use a converter that splits the serialized JSON into
> chunks during _write_, and uses links to point from one chunk to the
> next. During _fetch_ the links would be traversed, the JSON string
> concatenated from chunks, deserialized and the object would be
> returned. Looking at `com.basho.riak.client.api.convert.Converter`, it
> seems this is not going to work.


Linkwalking was deprecated in Riak 2.0 so I wouldn't do it that way.

I'm beginning to think that I'll need to remodel my data and use CRDTs
> for individual fields such as the `TreeMap`. Would that be a better
> way?


This sounds like a plausible idea.  If you do a lot of possibly conflicting
updates to the Tree, then a CRDT map would be the way to go.  You could
reuse the key from the main object, and just put it in the new
buckettype/bucket.

If you don't need to update the tree much, you could also just serialize
the tree into it's own object - split up the static data and the often
updated data, and put them in different buckets that share the same key.

Thanks,
Alex


On Thu, Jul 21, 2016 at 9:36 AM, Henning Verbeek 
wrote:

> I have a Java class, which is being stored in Riak. The class contains
> a `TreeMap` field, amongst other fields. Out of the box, Riak is
> converting the object to/from JSON. Everything works fine.
>
> However, depending on the size of the `TreeMap`, the serialization
> output can become rather large, and this limits the usefulness of my
> object. In our tests, dealing with Riak-objects >2MB proved to be
> significantly slower than dealing with objects <200kB.
>
> So, in order to store/fetch instances of my class with arbitrary
> sizes, but with reliable performance, I believe I need to split the
> output into separate Riak-objects after serialization, and reassemble
> before deserialization.
>
> My idea was to use a converter that splits the serialized JSON into
> chunks during _write_, and uses links to point from one chunk to the
> next. During _fetch_ the links would be traversed, the JSON string
> concatenated from chunks, deserialized and the object would be
> returned. Looking at `com.basho.riak.client.api.convert.Converter`, it
> seems this is not going to work.
>
> I'm beginning to think that I'll need to remodel my data and use CRDTs
> for individual fields such as the `TreeMap`. Would that be a better
> way?
>
> Any other recommendations would be much appreciated.
>
> Thanks,
> Henning
> --
> My other signature is a regular expression.
> http://www.pray4snow.de
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


How to best store arbitrarily large Java objects

2016-07-21 Thread Henning Verbeek
I have a Java class, which is being stored in Riak. The class contains
a `TreeMap` field, amongst other fields. Out of the box, Riak is
converting the object to/from JSON. Everything works fine.

However, depending on the size of the `TreeMap`, the serialization
output can become rather large, and this limits the usefulness of my
object. In our tests, dealing with Riak-objects >2MB proved to be
significantly slower than dealing with objects <200kB.

So, in order to store/fetch instances of my class with arbitrary
sizes, but with reliable performance, I believe I need to split the
output into separate Riak-objects after serialization, and reassemble
before deserialization.

My idea was to use a converter that splits the serialized JSON into
chunks during _write_, and uses links to point from one chunk to the
next. During _fetch_ the links would be traversed, the JSON string
concatenated from chunks, deserialized and the object would be
returned. Looking at `com.basho.riak.client.api.convert.Converter`, it
seems this is not going to work.

I'm beginning to think that I'll need to remodel my data and use CRDTs
for individual fields such as the `TreeMap`. Would that be a better
way?

Any other recommendations would be much appreciated.

Thanks,
Henning
-- 
My other signature is a regular expression.
http://www.pray4snow.de

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com