Sorry guys, indeed the problem was with different erlang versions. We moved all node to machines with erlang version:
$ erl Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false] and the problem seems to have gone away. Feels stupid and sorry for myself that we didn't looked into this before. Thanks a lot guys. -- Abhinav Singh http://abhinavsingh.com/ On 20-Dec-2012, at 8:21 PM, Abhinav Singh <[email protected]> wrote: > Hi Ryan, > > Thanks for your thoughts on this. However, we are afraid that it's not the > case really. > To further rule this out, we went ahead and reinstalled all the nodes in our > cluster by downloading current riak release from here: > http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/CURRENT/riak-1.2.1.tar.gz > > ~~~~~~~~~~~~ > > We start riak nodes with following updates inside etc/app.config file: > {pb_ip, "172.17.3.82" }, > {http, [ {"172.17.3.82", 8098 } ]}, > {storage_backend, riak_kv_eleveldb_backend}, > {riak_search, [{enabled, true}]}, > > and ofcourse we update etc/vm.args to have first line as: > -name [email protected] > > ~~~~~~~~~~~~ > > Finally we start 2 nodes in the cluster, join, plan, commit to finally have: > $ ./bin/riak-admin member-status > ================================= Membership > ================================== > Status Ring Pending Node > ------------------------------------------------------------------------------- > valid 50.0% -- '[email protected]' > valid 50.0% -- '[email protected]' > ------------------------------------------------------------------------------- > Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0 > > ~~~~~~~~~~~~ > > Also after joining nodes, we install necessary search hook by running > following command on [email protected] node only: > $ ./bin/search-cmd install ejabberd_doc > :: Installing Riak Search <--> KV hook on bucket 'ejabberd_doc'. > > ~~~~~~~~~~~~ > > All search query run fine with no docs inside riak. However as soon as we > even insert a single document, search queries start to fail. > As before, here are the logs from inside of riak error.log file (query is > made to node [email protected]) > > error.log on [email protected] contains: > 2012-12-20 16:27:37.877 [error] <0.1821.0>@mi_server:handle_info:524 > lookup/range failure: > {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]} > 2012-12-20 16:27:37.878 [error] emulator Error in process <0.4075.0> on node > '[email protected]' with exit value: > {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]} > > 2012-12-20 16:27:37.878 [error] <0.1940.0>@mi_server:handle_info:524 > lookup/range failure: > {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]} > 2012-12-20 16:27:37.882 [error] emulator Error in process <0.4077.0> on node > '[email protected]' with exit value: > {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]} > > error.log on [email protected] contains: > 2012-12-20 16:28:17.944 [error] <0.2511.0> gen_server <0.2511.0> terminated > with reason: > {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]} > 2012-12-20 16:28:17.949 [error] <0.2511.0> CRASH REPORT Process <0.2511.0> > with 2 neighbours exited with reason: > {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]} > in gen_server:terminate/6 line 737 > 2012-12-20 16:28:17.952 [error] <0.172.0> Supervisor riak_api_pb_sup had > child undefined started with {riak_api_pb_server,start_link,undefined} at > <0.2511.0> exit with reason > {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]} > in context child_terminated > > 2012-12-20 16:28:37.888 [error] <0.2504.0> gen_server <0.2504.0> terminated > with reason: > {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]} > 2012-12-20 16:28:37.901 [error] <0.2504.0> CRASH REPORT Process <0.2504.0> > with 1 neighbours exited with reason: > {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]} > in gen_server:terminate/6 line 737 > 2012-12-20 16:28:37.904 [error] <0.172.0> Supervisor riak_api_pb_sup had > child undefined started with {riak_api_pb_server,start_link,undefined} at > <0.2504.0> exit with reason > {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]} > in context child_terminated > > ~~~~~~~~~~~~ > > We don't really have mixed riak release. But yes we do have a mixed erlang > releases. Not sure if that makes any difference here. > > [email protected] > Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] > [async-threads:0] [kernel-poll:false] > > [email protected] > Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] > [kernel-poll:false] > > ~~~~~~~~~~~~ > > Unfortunately none of these error happens on our local dev environment. > On my local dev box, I run a 5 node cluster (ofcourse all nodes on same > physical machine). > > I will really appreciate if you can share your thoughts further on this > particular issue. > There is still some time before we hit our application in production and > really would like to fix this before app release. > > Thanks in advance for your time. > > -- > Abhinav Singh > http://abhinavsingh.com/ > > On 14-Dec-2012, at 10:04 PM, Ryan Zezeski <[email protected]> wrote: > >> >> Hi, comments inline >> >> On Wed, Dec 5, 2012 at 8:10 AM, Abhinav Singh <[email protected]> wrote: >> >> We are facing an issue where search queries works fine on my local dev box >> (which have riak-1.2.1rc2 installed). >> However same queries timeout on our production boxes (which have riak-1.2.1 >> installed): >> >> 2012-12-05 14:49:59.777 [error] <0.1035.0>@mi_server:handle_info:524 >> lookup/range failure: >> {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]} >> >> Did you recently upgrade your production boxes? The 'badfun' error is an >> indication that you currently have a "mixed" cluster. The error will occur >> when two or more machines are involved and they are not all the same >> version. This is a bug in Riak Search. >> >> >> This query does succeed sometimes (1-5%), but fails most of the times. >> I want to know if the above logs indicate towards a particular error with >> our riak cluster? >> >> Yes, so in 1-5% of the cases the nodes involved in a query are all the same >> version. The reasons this is non-deterministic is because Riak Search uses >> some randomness during query time to help spread load around. >> >> >> Since this query has never failed on my local development box, >> I suspect either it has to do with something that changed between 1.2.1rc2 >> and 1.2.1-stable release or something that is related to our production riak >> cluster. >> >> >> >> As I said above. I strongly suspect a mixed cluster scenario. That's the >> only time I've seen an error like the above. The second email also strongly >> indicates a mixed cluster scenario given the behavior you are seeing. >> >> -Z >> > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
