Advice for storing records in Riak

Toby Corkindale Fri, 12 Apr 2013 00:11:02 -0700

Hi,

I wondered if I could get a little advice on good practices for storingmy records in Riak, such that they perform reasonably well in map-reducequeries?

I have a little over 200 million records, currently stored in a regularSQL database. I'm expecting this dataset to continue to grow, of course.Each record is reasonably small - some get up to couple of hundredbytes, but most are smaller, and consist of around a dozen numericfields and some small alphanumeric identifier fields.

My initial trial of importing these into Riak were to take each databaserow and convert it into a small JSON of key=>value pairs.


I'm find two issues with this though.

1) It takes a really long time to import everything into Riak, at leastcompared to ingesting into PostgreSQL. (I'm using Riak's HTTP API)2) An initial trial of some map-reduce queries was significantly slowerthan I was hoping; I suspect this is because of my data structure though.My initial map phase was iterating over a high percentage of the keys,decoding the JSON, and then returning just one or two of the fields fromthe JSON structure, which is maybe an inefficient way to go about things?

So I was wondering if there's a better way to be approaching theproblem.. I wondered about breaking up the records further, and storingindividual fields against keys, rather than the whole record as a JSONobject.


Eg. This was my initial method:
Key: {id}:{recordtype}:{recordid}
Value: { field1: "foo", field2: "bar", field3: "baz" }

I wondered about this, creating one key for each field:
{id}:{recordtype}:{recordid}:field1 ==> "foo"
{id}:{recordtype}:{recordid}:field2 ==> "bar"
{id}:{recordtype}:{recordid}:field3 ==> "baz"

That would avoid the need for one of the map phases; but on the otherhand, now I'd be creating an order of magnitude more overall keys in the db.

On the other hand, I wondered about going the other way, and groupingrecords under one key. So instead of having keys 100, 101, 102 .. 109, Iwould have one key 10x that contained a JSON structure with an array ofrecords.. (I don't know whether I store 10, 50 or 100 records per key)

This would speed up the time taken to ingest data to Riak, and reducethe number of total queries made by the map phase.. but would increasethe work DONE in the map phase and add inefficiencies as sometimes onlya few rows of the set would actually be required for a given query.

And the third consideration is that maybe I just need to scale up thecluster size to have more machines. Currently it's running on a smallcluster of four nodes while trialling Riak. (And I'm comparingperformance with a single, but significantly more powerful, PostgreSQL node)

There's nothing to stop me trying out all these methods, but I thoughtI'd poll the community for advice since no doubt implemented similarthings before and know what rough things may or may not work well.



Thanks,
Toby

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Advice for storing records in Riak

Reply via email to