Herman,

First: Note that nodejs, while asynchronous, is single threaded. Make sure 
you're not overwhelming your server (100% CPU, swapping memory), or running 
into internal http connection pool limits (Node's default is 5 simultaneous 
connections, I bump mine to 200). If you are, look at cluster2 for multi-thread 
single server clustering, and proxies like haproxy for multi-server clustering.

Second: It would seem to me you're begging for concurrency issues doing it this 
way, it would be much better to have, say a bucket per parent url (I don't 
think riak has any bucket count limits right?). The on disk storage uses 
bucket+key as the storage key anyways, so this allows you to do one level of 
key-filtering (which riak isn't amazing at) for free. 

Other options: Secondary indexes (2i) with the parent url (pretty doable), or 
riak-search (probably overkill). 

Also, make sure you're actually distributing your requests across multiple riak 
nodes, the client may or may not handle this for you. I run my riak nodes 
behind a haproxy to distribute the requests (Although I'm using riakjs, so 
client behavior will differ.)

If you want more information on the nodejs specifics, I can probably give you 
some points from my codebase. Other users are probably much more knowledgeable 
about effectively implementing 2i and riak-search (I've only played with them.)

- Alex

----- Original Message -----
From: "Herman Junge" <[email protected]>
To: [email protected]
Sent: Tuesday, October 30, 2012 11:27:51 AM
Subject: Using riak as a `Comment` Store - Slow results



Hi list. 

I am doing a research on using riak a s a solution to store comments. 
Unfortunately my results were far from favorable. I will develop the a 
rchitecture I used, schemas chose n, steps taken and results ; Hoping to get 
feedback both f rom basho or any experienced user on what to do to improve 
these times , or wheter to discard riak as a store for comments. 

1. The problem 

Store comments. Given a _pa rent_url_ (which could be a blo g po st, an image, 
an ything with an url), group its comments. 


2. Architecture 


2.1. Riak Database 

Used a joyen t cloud and set up 4 SmartOS machines with 1024 M B RAM each. They 
have riak preinstalled. 


2.2. Client Applica tion 

A pplication built in node.js , used expre ssJS framework ( 
https://github.com/visionmedia/express ) to respond HTTP requests (specifically 
PUT and GET) . The Riak library is node_riak ( 
https://github.com/mranney/node_riak ) , which has been `tested in comba t ` by 
its creato rs in voxer. 


The client application runs in another machine in the jo yent cloud , this 
machine an ubuntu 12 .04 with 1024 MB RAM. 


3. S chemas Chosen 

I went with a very simple schema: Sin ce the comments are grouped by 
_parent_url_ . I'm using parent_url as a key, its value being a n array of the 
comments in json. 

An example for a key is : <server_url>/riak/par 
ent_url/http%3A%2F%2Fpath%2Fto%2Fmy%2Fsite%2Ffile.html 

An example for a value is: 

{ "comments" : 
[ 
{ "date" : "'2012-10-30T14:50:11.898Z" 
, "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." 
, "author" : "John Doe" 
} 
, { "date" : "'2012-10-30T14:50:11.898Z" 
, "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." 
, "author" : "John Doe" 
} 
, { "date" : "'2012-10-30T14:50:11.898Z" 
, "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." 
, "author" : "John Doe" 
} 
] 
} 

4. Steps Taken 

4.1. Client API : 

My Client API tooks two requests: 

* PUT /comment 
* GET /comments/:parent_url?offset=<offset>&limit=<limit> 

4.1.1 PUT /comment 

Stores a co mment in the parent_url given inside the request. I use the 
node_riak's method `client.m odify ()`, which `GET`'s the parent_url value to 
take its value, the n apply the mutation (given by the library user, in this 
case is just pushing the json val ue of the comment in the array), then, `P U 
T`'s its new value on the parent_url key. 

4.1.2 GET /comments/:parent_url?offset=<offset>&limit=<limit> 

GETS the comments from a parent_url given, starting from <offset> to <limit>. 

Internally I just issue a `GET` to riak, the controller of my client does the 
offset, limit extraction. 

! 4.2. The Stress Test 

Issued a new joyent machine (an Ubuntu 12.04 with 1024 MB RAM) just to make 
`ab` stress tests. 

I done six tests: 

API method nº of requests concurrency 
PUT (*1) 10,000 5 
PUT (*1) 10,000 50 
PUT (*1) 10,000 500 
GET (*2) 10,000 5 
GET (*2) 10,000 50 
GET (*2) 10,000 500 

(*1) PUT /comment 
(*2) GET 
comments/http%3A%2F%2Fpath%2Fto%2Fmy%2Fsite%2F1111.html?offset=25&limit=20 


5. Results 

The following tables show the results I got on each test: 

PUT 
10000 5 



50% 116 
65% 142 
70% 161 
85% 177 
90% 274 
95% 486 
98% 751 
99% 1165 
100% 1065 

PUT 
10000 50 



50% 1879 
65% 1990 
70% 2068 
85% 2124 
90% 2364 
95% 2734 
98% 4062 
99% 4591 
100% 11258 

PUT 
10000 500 



50% 20876 
65% 21491 
70% 21919 
85% 22202 
90% 23136 
95% 23914 
98% 25036 
99% 25835 
100% 29611 

GET 
10000 



50% 68 
65% 75 
70% 80 
85% 83 
90% 94 
95% 107 
98% 145 
99% 475 
100% 535 

GET 
10000 



50% 631 
65% 673 
70% 701 
85% 719 
90% 783 
95% 913 
98% 1054 
99% 1099 
100% 1265 

GET 
10000 



50% 6363 
65% 6636 
70% 6820 
85% 6934 
90% 7218 
95% 7442 
98% 7691 
99% 7836 
100% 8435 


6. Conclusion 

At first sight, I'm getting very unfavorable results (compared with one table 
MySQL unconfigured under the very same requests). So I'm requesting from 
feedback from you: 

a) ¿Is it a good idea to use Riak as a comment store? 

b) Are these times expected? (in other words, where I am making a big mistake)? 

Regards, 

Herman Junge 
@hermanjunge 








_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to