Sorry guys, indeed the problem was with different erlang versions.
We moved all node to machines with erlang version:

$ erl
Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] 
[kernel-poll:false]

and the problem seems to have gone away.
Feels stupid and sorry for myself that we didn't looked into this before.

Thanks a lot guys.
--
Abhinav Singh
http://abhinavsingh.com/

On 20-Dec-2012, at 8:21 PM, Abhinav Singh <[email protected]> wrote:

> Hi Ryan,
> 
> Thanks for your thoughts on this. However, we are afraid that it's not the 
> case really.
> To further rule this out, we went ahead and reinstalled all the nodes in our 
> cluster by downloading current riak release from here:
> http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/CURRENT/riak-1.2.1.tar.gz
> 
> ~~~~~~~~~~~~
> 
> We start riak nodes with following updates inside etc/app.config file:
> {pb_ip,   "172.17.3.82" },
> {http, [ {"172.17.3.82", 8098 } ]},
> {storage_backend, riak_kv_eleveldb_backend},
> {riak_search, [{enabled, true}]},
> 
> and ofcourse we update etc/vm.args to have first line as:
> -name [email protected]
> 
> ~~~~~~~~~~~~
> 
> Finally we start 2 nodes in the cluster, join, plan, commit to finally have:  
> $ ./bin/riak-admin member-status
> ================================= Membership 
> ==================================
> Status     Ring    Pending    Node
> -------------------------------------------------------------------------------
> valid      50.0%      --      '[email protected]'
> valid      50.0%      --      '[email protected]'
> -------------------------------------------------------------------------------
> Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> 
> ~~~~~~~~~~~~
> 
> Also after joining nodes, we install necessary search hook by running 
> following command on [email protected] node only:
> $ ./bin/search-cmd install ejabberd_doc
>  :: Installing Riak Search <--> KV hook on bucket 'ejabberd_doc'.
> 
> ~~~~~~~~~~~~
> 
> All search query run fine with no docs inside riak. However as soon as we 
> even insert a single document, search queries start to fail.
> As before, here are the logs from inside of riak error.log file (query is 
> made to node [email protected])
> 
> error.log on [email protected] contains:
> 2012-12-20 16:27:37.877 [error] <0.1821.0>@mi_server:handle_info:524 
> lookup/range failure: 
> {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]}
> 2012-12-20 16:27:37.878 [error] emulator Error in process <0.4075.0> on node 
> '[email protected]' with exit value: 
> {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]}
> 
> 2012-12-20 16:27:37.878 [error] <0.1940.0>@mi_server:handle_info:524 
> lookup/range failure: 
> {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]}
> 2012-12-20 16:27:37.882 [error] emulator Error in process <0.4077.0> on node 
> '[email protected]' with exit value: 
> {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]}
> 
> error.log on [email protected] contains:
> 2012-12-20 16:28:17.944 [error] <0.2511.0> gen_server <0.2511.0> terminated 
> with reason: 
> {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]}
> 2012-12-20 16:28:17.949 [error] <0.2511.0> CRASH REPORT Process <0.2511.0> 
> with 2 neighbours exited with reason: 
> {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]}
>  in gen_server:terminate/6 line 737
> 2012-12-20 16:28:17.952 [error] <0.172.0> Supervisor riak_api_pb_sup had 
> child undefined started with {riak_api_pb_server,start_link,undefined} at 
> <0.2511.0> exit with reason 
> {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]}
>  in context child_terminated
> 
> 2012-12-20 16:28:37.888 [error] <0.2504.0> gen_server <0.2504.0> terminated 
> with reason: 
> {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]}
> 2012-12-20 16:28:37.901 [error] <0.2504.0> CRASH REPORT Process <0.2504.0> 
> with 1 neighbours exited with reason: 
> {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]}
>  in gen_server:terminate/6 line 737
> 2012-12-20 16:28:37.904 [error] <0.172.0> Supervisor riak_api_pb_sup had 
> child undefined started with {riak_api_pb_server,start_link,undefined} at 
> <0.2504.0> exit with reason 
> {error,{case_clause,timeout},[{riak_search_client,search_doc,8,[{file,"src/riak_search_client.erl"},{line,165}]},{riak_search_utils,run_query,7,[{file,"src/riak_search_utils.erl"},{line,283}]},{riak_search_pb_query,run_query,7,[{file,"src/riak_search_pb_query.erl"},{line,96}]},{riak_search_pb_query,process,2,[{file,"src/riak_search_pb_query.erl"},{line,80}]},{riak_api_pb_server,process_message,4,[{file,"src/riak_api_pb_server.erl"},{line,203}]},{riak_api_pb_server,handle_info,2,[{file,"src/..."},...]},...]}
>  in context child_terminated
> 
> ~~~~~~~~~~~~
> 
> We don't really have mixed riak release. But yes we do have a mixed erlang 
> releases. Not sure if that makes any difference here.
> 
> [email protected]
> Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] 
> [async-threads:0] [kernel-poll:false]
> 
> [email protected]
> Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] 
> [kernel-poll:false]
> 
> ~~~~~~~~~~~~
> 
> Unfortunately none of these error happens on our local dev environment.
> On my local dev box, I run a 5 node cluster (ofcourse all nodes on same 
> physical machine).
> 
> I will really appreciate if you can share your thoughts further on this 
> particular issue.
> There is still some time before we hit our application in production and 
> really would like to fix this before app release.
> 
> Thanks in advance for your time.
> 
> --
> Abhinav Singh
> http://abhinavsingh.com/
> 
> On 14-Dec-2012, at 10:04 PM, Ryan Zezeski <[email protected]> wrote:
> 
>> 
>> Hi, comments inline
>> 
>> On Wed, Dec 5, 2012 at 8:10 AM, Abhinav Singh <[email protected]> wrote:
>> 
>> We are facing an issue where search queries works fine on my local dev box 
>> (which have riak-1.2.1rc2 installed).
>> However same queries timeout on our production boxes (which have riak-1.2.1 
>> installed):
>> 
>> 2012-12-05 14:49:59.777 [error] <0.1035.0>@mi_server:handle_info:524 
>> lookup/range failure: 
>> {{badfun,#Fun<riak_search_client.9.56347389>},[{mi_server,iterate,6},{mi_server,lookup,8}]}
>> 
>> Did you recently upgrade your production boxes?  The 'badfun' error is an 
>> indication that you currently have a "mixed" cluster.  The error will occur 
>> when two or more machines are involved and they are not all the same 
>> version.  This is a bug in Riak Search.
>>  
>> 
>> This query does succeed sometimes (1-5%), but fails most of the times.
>> I want to know if the above logs indicate towards a particular error with 
>> our riak cluster?
>> 
>> Yes, so in 1-5% of the cases the nodes involved in a query are all the same 
>> version.  The reasons this is non-deterministic is because Riak Search uses 
>> some randomness during query time to help spread load around.
>>  
>> 
>> Since this query has never failed on my local development box, 
>> I suspect either it has to do with something that changed between 1.2.1rc2 
>> and 1.2.1-stable release or something that is related to our production riak 
>> cluster.
>> 
>> 
>> 
>> As I said above.  I strongly suspect a mixed cluster scenario.  That's the 
>> only time I've seen an error like the above.  The second email also strongly 
>> indicates a mixed cluster scenario given the behavior you are seeing.
>> 
>> -Z
>> 
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to