Herman, First: Note that nodejs, while asynchronous, is single threaded. Make sure you're not overwhelming your server (100% CPU, swapping memory), or running into internal http connection pool limits (Node's default is 5 simultaneous connections, I bump mine to 200). If you are, look at cluster2 for multi-thread single server clustering, and proxies like haproxy for multi-server clustering.
Second: It would seem to me you're begging for concurrency issues doing it this way, it would be much better to have, say a bucket per parent url (I don't think riak has any bucket count limits right?). The on disk storage uses bucket+key as the storage key anyways, so this allows you to do one level of key-filtering (which riak isn't amazing at) for free. Other options: Secondary indexes (2i) with the parent url (pretty doable), or riak-search (probably overkill). Also, make sure you're actually distributing your requests across multiple riak nodes, the client may or may not handle this for you. I run my riak nodes behind a haproxy to distribute the requests (Although I'm using riakjs, so client behavior will differ.) If you want more information on the nodejs specifics, I can probably give you some points from my codebase. Other users are probably much more knowledgeable about effectively implementing 2i and riak-search (I've only played with them.) - Alex ----- Original Message ----- From: "Herman Junge" <[email protected]> To: [email protected] Sent: Tuesday, October 30, 2012 11:27:51 AM Subject: Using riak as a `Comment` Store - Slow results Hi list. I am doing a research on using riak a s a solution to store comments. Unfortunately my results were far from favorable. I will develop the a rchitecture I used, schemas chose n, steps taken and results ; Hoping to get feedback both f rom basho or any experienced user on what to do to improve these times , or wheter to discard riak as a store for comments. 1. The problem Store comments. Given a _pa rent_url_ (which could be a blo g po st, an image, an ything with an url), group its comments. 2. Architecture 2.1. Riak Database Used a joyen t cloud and set up 4 SmartOS machines with 1024 M B RAM each. They have riak preinstalled. 2.2. Client Applica tion A pplication built in node.js , used expre ssJS framework ( https://github.com/visionmedia/express ) to respond HTTP requests (specifically PUT and GET) . The Riak library is node_riak ( https://github.com/mranney/node_riak ) , which has been `tested in comba t ` by its creato rs in voxer. The client application runs in another machine in the jo yent cloud , this machine an ubuntu 12 .04 with 1024 MB RAM. 3. S chemas Chosen I went with a very simple schema: Sin ce the comments are grouped by _parent_url_ . I'm using parent_url as a key, its value being a n array of the comments in json. An example for a key is : <server_url>/riak/par ent_url/http%3A%2F%2Fpath%2Fto%2Fmy%2Fsite%2Ffile.html An example for a value is: { "comments" : [ { "date" : "'2012-10-30T14:50:11.898Z" , "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." , "author" : "John Doe" } , { "date" : "'2012-10-30T14:50:11.898Z" , "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." , "author" : "John Doe" } , { "date" : "'2012-10-30T14:50:11.898Z" , "text" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit." , "author" : "John Doe" } ] } 4. Steps Taken 4.1. Client API : My Client API tooks two requests: * PUT /comment * GET /comments/:parent_url?offset=<offset>&limit=<limit> 4.1.1 PUT /comment Stores a co mment in the parent_url given inside the request. I use the node_riak's method `client.m odify ()`, which `GET`'s the parent_url value to take its value, the n apply the mutation (given by the library user, in this case is just pushing the json val ue of the comment in the array), then, `P U T`'s its new value on the parent_url key. 4.1.2 GET /comments/:parent_url?offset=<offset>&limit=<limit> GETS the comments from a parent_url given, starting from <offset> to <limit>. Internally I just issue a `GET` to riak, the controller of my client does the offset, limit extraction. ! 4.2. The Stress Test Issued a new joyent machine (an Ubuntu 12.04 with 1024 MB RAM) just to make `ab` stress tests. I done six tests: API method nº of requests concurrency PUT (*1) 10,000 5 PUT (*1) 10,000 50 PUT (*1) 10,000 500 GET (*2) 10,000 5 GET (*2) 10,000 50 GET (*2) 10,000 500 (*1) PUT /comment (*2) GET comments/http%3A%2F%2Fpath%2Fto%2Fmy%2Fsite%2F1111.html?offset=25&limit=20 5. Results The following tables show the results I got on each test: PUT 10000 5 50% 116 65% 142 70% 161 85% 177 90% 274 95% 486 98% 751 99% 1165 100% 1065 PUT 10000 50 50% 1879 65% 1990 70% 2068 85% 2124 90% 2364 95% 2734 98% 4062 99% 4591 100% 11258 PUT 10000 500 50% 20876 65% 21491 70% 21919 85% 22202 90% 23136 95% 23914 98% 25036 99% 25835 100% 29611 GET 10000 50% 68 65% 75 70% 80 85% 83 90% 94 95% 107 98% 145 99% 475 100% 535 GET 10000 50% 631 65% 673 70% 701 85% 719 90% 783 95% 913 98% 1054 99% 1099 100% 1265 GET 10000 50% 6363 65% 6636 70% 6820 85% 6934 90% 7218 95% 7442 98% 7691 99% 7836 100% 8435 6. Conclusion At first sight, I'm getting very unfavorable results (compared with one table MySQL unconfigured under the very same requests). So I'm requesting from feedback from you: a) ¿Is it a good idea to use Riak as a comment store? b) Are these times expected? (in other words, where I am making a big mistake)? Regards, Herman Junge @hermanjunge _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
