Still, doesn't that failure show a typical overload of Riak's usage of
mochiglobal (i.e., the code_server needing to lock all Erlang schedulers)? I
understand that running more than one node on a single machine is not realistic
deployment. However, I don't see why it would cause errors, unless Riak was
unable to handle the requests incoming.
On 10/01/2012 01:54 PM, Alexander Sicular wrote:
> Any time you overload one box you run into all sorts of i/o dreck, screw with
> your conf files and mess with your versions you just have too many variables
> in the mix to get anything meaningful out of what you were trying to do.
> Since this is a test just tear the whole thing down and start clean.
>
> If you want to dev test your app just use one node and dial the n val down to
> one in the app.config, which isn't actually there so you'll have to add it
> manually to the riak_core section like so (with some other stuff):
>
> {default_bucket_props, [{n_val,1},
> {allow_mult,false},
> {last_write_wins,false},
> {precommit, []},
> {postcommit, []},
> {chash_keyfun, {riak_core_util, chash_std_keyfun}}
> ]}
>
> (Hey Basho people, that stuff should be in the app.config file by default.
> Making people go fish for it and figure out how and where to add this stuff
> is kinda unnecessary. Here is an example of a great conf file with everything
> you can conf and a whole bunch of docs:
> https://github.com/antirez/redis/blob/unstable/redis.conf ).
>
> If you want to performance test your app make your dev system as similar to
> your prod system as possible and knock it out.
>
>
> -Alexander Sicular
>
> @siculars
>
> On Oct 1, 2012, at 4:30 PM, Callixte Cauchois wrote:
>
>> Thank you, but can you explain a bit more?
>> I mean I understand why it is a bad thing with regards to reliability and in
>> case of hardware issues. But does it have also an impact on the behaviour
>> when the hardware is performing correctly and the load on the machines are
>> the same?
>>
>> On Mon, Oct 1, 2012 at 1:25 PM, Alexander Sicular <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Inline.
>>
>> -Alexander Sicular
>>
>> @siculars
>>
>> On Oct 1, 2012, at 3:23 PM, Callixte Cauchois wrote:
>>
>> > Hi there,
>> >
>> > so, I am currently evaluating Riak to see how it can fit in our
>> platform. To do so I have set up a cluster of 4 nodes on SmartOS, all of
>> them on the same physical box.
>>
>> Mistake. Just stop here. Everything else doesn't matter. Do not put all
>> your virtual machines (riak nodes) on one physical machine. Put em on
>> different physical machines. Fix the config files and try again.
>>
>> > I then built a simple application in node.js that get log events from
>> our production system through a RabbitMQ queue and store them in my cluster.
>> I let Riak generate the ids, but I have added two secondary indices to be
>> able to retrieve more easily all the log events that belong to a single
>> session.
>> > Everything was going fine, events come around 130 messages per second
>> are easily ingested by Riak. When stop it and then restart it, there is a
>> bit of an issue as the events are read from the queue at 1500 messages per
>> second and the insertion times go up, so I need some retries to actually
>> store everything.
>> > I wanted to tweak the LevelDB params to increase the throughput. To do
>> so, I first upgraded from 1.1.6 to 1.2.0. I chose what I thought was the
>> safest way: node by node, I have them leave the cluster, then I upgrade,
>> then join again. During the whole process I kept inserting.
>> > It went quite well. But, when I ran some queries using 2i, it gave me
>> errors and I realized that for two of my four nodes, I forgot to put back
>> eLevelDB as the default engine. As soon as I ran this query, everything went
>> havoc, a lot of inserts failed, some nodes where not reachable using the
>> ping url.
>> > I changed the default engine and restarted those nodes, nothing
>> changed. I tried to make them leave the cluster, after two days, they are
>> still leaving. Riak-admin transfers tells that a lot of transfers need to
>> occur, but the system is stuck: the numbers there do not change.
>> >
>> > I guess I have done several things wrong. It is test data, so it
>> doesn't really matter if I loose data or if I have to re-start from scratch,
>> but I want to understand what have gone wrong how I could have fixed it. Or
>> if I even can recover from there now.
>> >
>> > Thank you.
>> > C.
>> > _______________________________________________
>> > riak-users mailing list
>> > [email protected] <mailto:[email protected]>
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com