Hi Simon, Sorry for the delays. I’m on vacation for a couple of days. Will pick this up on Monday.
Cheers Russell On 1 Aug 2014, at 09:56, Effenberg, Simon <[email protected]> wrote: > Hi Russell, @basho > > any updates on this? We still have the issues with 2i (repair is also > still not possible) and searching for the 2i indexes is reproducable > creating (for one range I tested) 3 different values. > > I would love to provide anything you need to debug that issue. > > Cheers > Simon > > On Wed, Jul 30, 2014 at 09:22:56AM +0000, Effenberg, Simon wrote: >> Great. Thanks Russell.. >> >> if you need me to do something.. feel free to ask. >> >> Cheers >> Simon >> >> On Wed, Jul 30, 2014 at 10:19:56AM +0100, Russell Brown wrote: >>> Thanks Simon, >>> >>> I’m going to spend a some time on this day. >>> >>> Cheers >>> >>> Russell >>> >>> On 30 Jul 2014, at 10:05, Effenberg, Simon <[email protected]> >>> wrote: >>> >>>> Hi Russel, >>>> >>>> still one machine out of 13 is on wheezy and the rest on squeeze but the >>>> software is the same and basho is providing even the erlang stuff. So >>>> their should no real difference inside the application. >>>> >>>> And the errors are almost the same (except the async_write/read >>>> difference). >>>> >>>> I paste them: >>>> >>>> ---------- node 1 ----------- >>>> >>>> 2014-07-30 06:16:07.728 UTC [info] >>>> <0.14871.336>@riak_kv_2i_aae:next_partition:160 Finished 2i repair: >>>> Total partitions: 1 >>>> Finished partitions: 1 >>>> Speed: 100 >>>> Total 2i items scanned: 0 >>>> Total tree objects: 0 >>>> Total objects fixed: 0 >>>> With errors: >>>> Partition: 125597796958124469533129165311555572001681702912 >>>> Error: index_scan_timeout >>>> >>>> >>>> 2014-07-30 06:16:07.728 UTC [error] <0.1525.0> gen_server <0.1525.0> >>>> terminated with reason: bad argument in call to >>>> eleveldb:async_write(#Ref<0.0.324.211123>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97 >>>> ,116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 >>>> line 155 >>>> 2014-07-30 06:16:07.728 UTC [error] <0.1525.0> CRASH REPORT Process >>>> <0.1525.0> with 0 neighbours exited with reason: bad argument in call to >>>> eleveldb:async_write(#Ref<0.0.324.211123>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,11 >>>> 1,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747 >>>> 2014-07-30 06:16:07.728 UTC [error] <0.1517.0> Supervisor >>>> {<0.1517.0>,poolboy_sup} had child riak_core_vnode_worker started with >>>> {riak_core_vnode_worker,start_link,undefined} at <0.1525.0> exit with >>>> reason bad argument in call >>>> to eleveldb:async_write(#Ref<0.0.324.211123>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 in context child_terminated >>>> >>>> >>>> ---------- node 2 ----------- >>>> >>>> 2014-07-30 06:16:07.791 UTC [info] >>>> <0.8083.314>@riak_kv_2i_aae:next_partition:160 Finished 2i repair: >>>> Total partitions: 1 >>>> Finished partitions: 1 >>>> Speed: 100 >>>> Total 2i items scanned: 0 >>>> Total tree objects: 0 >>>> Total objects fixed: 0 >>>> With errors: >>>> Partition: 622279994019798508141412682679979879462877528064 >>>> Error: index_scan_timeout >>>> >>>> >>>> 2014-07-30 06:16:07.791 UTC [error] <0.1884.0> gen_server <0.1884.0> >>>> terminated with reason: bad argument in call to >>>> eleveldb:async_write(#Ref<0.0.318.96628>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97, >>>> 116,105,111,110,95,115,101,99,114,...>>,...}], []) in eleveldb:write/3 >>>> line 155 >>>> 2014-07-30 06:16:07.791 UTC [error] <0.1884.0> CRASH REPORT Process >>>> <0.1884.0> with 0 neighbours exited with reason: bad argument in call to >>>> eleveldb:async_write(#Ref<0.0.318.96628>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111 >>>> ,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], []) >>>> in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747 >>>> 2014-07-30 06:16:07.792 UTC [error] <0.1875.0> Supervisor >>>> {<0.1875.0>,poolboy_sup} had child riak_core_vnode_worker started with >>>> {riak_core_vnode_worker,start_link,undefined} at <0.1884.0> exit with >>>> reason bad argument in call >>>> to eleveldb:async_write(#Ref<0.0.318.96628>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 in context child_terminated >>>> >>>> ---------- node 3 ----------- >>>> >>>> 2014-07-30 06:17:42.679 UTC [info] >>>> <0.15746.299>@riak_kv_2i_aae:next_partition:160 Finished 2i repair: >>>> Total partitions: 1 >>>> Finished partitions: 1 >>>> Speed: 100 >>>> Total 2i items scanned: 0 >>>> Total tree objects: 0 >>>> Total objects fixed: 0 >>>> With errors: >>>> Partition: 291158529312015815735890337767697007822080311296 >>>> Error: index_scan_timeout >>>> >>>> >>>> 2014-07-30 06:17:42.679 UTC [error] <0.975.0> gen_server <0.975.0> >>>> terminated with reason: bad argument in call to >>>> eleveldb:async_write(#Ref<0.0.2075.159423>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 >>>> 2014-07-30 06:17:42.679 UTC [error] <0.975.0> CRASH REPORT Process >>>> <0.975.0> with 0 neighbours exited with reason: bad argument in call to >>>> eleveldb:async_write(#Ref<0.0.2075.159423>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747 >>>> 2014-07-30 06:17:42.679 UTC [error] <0.969.0> Supervisor >>>> {<0.969.0>,poolboy_sup} had child riak_core_vnode_worker started with >>>> {riak_core_vnode_worker,start_link,undefined} at <0.975.0> exit with >>>> reason bad argument in call to eleveldb:async_write(#Ref<0.0.2075.159423>, >>>> <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 in context child_terminated >>>> >>>> ---------- node 4 ----------- >>>> >>>> 2014-07-30 06:16:10.004 UTC [info] >>>> <0.28895.382>@riak_kv_2i_aae:next_partition:160 Finished 2i repair: >>>> Total partitions: 1 >>>> Finished partitions: 1 >>>> Speed: 100 >>>> Total 2i items scanned: 0 >>>> Total tree objects: 0 >>>> Total objects fixed: 0 >>>> With errors: >>>> Partition: 319703483166135013357056057156686910549735243776 >>>> Error: index_scan_timeout >>>> >>>> >>>> 2014-07-30 06:16:10.004 UTC [error] <0.1580.0> gen_server <0.1580.0> >>>> terminated with reason: bad argument in call to >>>> eleveldb:async_write(#Ref<0.0.367.155781>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 >>>> 2014-07-30 06:16:10.004 UTC [error] <0.1580.0> CRASH REPORT Process >>>> <0.1580.0> with 0 neighbours exited with reason: bad argument in call to >>>> eleveldb:async_write(#Ref<0.0.367.155781>, <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747 >>>> 2014-07-30 06:16:10.005 UTC [error] <0.1570.0> Supervisor >>>> {<0.1570.0>,poolboy_sup} had child riak_core_vnode_worker started with >>>> {riak_core_vnode_worker,start_link,undefined} at <0.1580.0> exit with >>>> reason bad argument in call to eleveldb:async_write(#Ref<0.0.367.155781>, >>>> <<>>, >>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>> []) in eleveldb:write/3 line 155 in context child_terminated >>>> >>>> ---------- node 5 ----------- >>>> >>>> 2014-07-30 06:16:09.191 UTC [info] >>>> <0.15985.355>@riak_kv_2i_aae:next_partition:160 Finished 2i repair: >>>> Total partitions: 1 >>>> Finished partitions: 1 >>>> Speed: 100 >>>> Total 2i items scanned: 0 >>>> Total tree objects: 0 >>>> Total objects fixed: 0 >>>> With errors: >>>> Partition: 833512652540280570538039006158505159647524028416 >>>> Error: index_scan_timeout >>>> >>>> >>>> 2014-07-30 06:16:09.191 UTC [error] <0.1601.0> gen_server <0.1601.0> >>>> terminated with reason: bad argument in call to >>>> eleveldb:async_get(#Ref<0.0.351.26505>, <<>>, >>>> <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, >>>> []) in eleveldb:get/3 line 143 >>>> 2014-07-30 06:16:09.191 UTC [error] <0.1601.0> CRASH REPORT Process >>>> <0.1601.0> with 0 neighbours exited with reason: bad argument in call to >>>> eleveldb:async_get(#Ref<0.0.351.26505>, <<>>, >>>> <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, >>>> []) in eleveldb:get/3 line 143 in gen_server:terminate/6 line 747 >>>> 2014-07-30 06:16:09.192 UTC [error] <0.1598.0> Supervisor >>>> {<0.1598.0>,poolboy_sup} had child riak_core_vnode_worker started with >>>> {riak_core_vnode_worker,start_link,undefined} at <0.1601.0> exit with >>>> reason bad argument in call to eleveldb:async_get(#Ref<0.0.351.26505>, >>>> <<>>, >>>> <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, >>>> []) in eleveldb:get/3 line 143 in context child_terminated >>>> >>>> ---------- node 6 ----------- >>>> >>>> 2014-07-30 06:16:09.154 UTC [info] >>>> <0.32042.379>@riak_kv_2i_aae:next_partition:160 Finished 2i repair: >>>> Total partitions: 1 >>>> Finished partitions: 1 >>>> Speed: 100 >>>> Total 2i items scanned: 0 >>>> Total tree objects: 0 >>>> Total objects fixed: 0 >>>> With errors: >>>> Partition: 34253944624943037145398863266787883273185918976 >>>> Error: index_scan_timeout >>>> >>>> >>>> 2014-07-30 06:16:09.154 UTC [error] <0.4086.0> gen_server <0.4086.0> >>>> terminated with reason: bad argument in call to >>>> eleveldb:async_get(#Ref<0.0.2698.198008>, <<>>, >>>> <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, >>>> []) in eleveldb:get/3 line 143 >>>> 2014-07-30 06:16:09.154 UTC [error] <0.4086.0> CRASH REPORT Process >>>> <0.4086.0> with 0 neighbours exited with reason: bad argument in call to >>>> eleveldb:async_get(#Ref<0.0.2698.198008>, <<>>, >>>> <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, >>>> []) in eleveldb:get/3 line 143 in gen_server:terminate/6 line 747 >>>> 2014-07-30 06:16:09.154 UTC [error] <0.4085.0> Supervisor >>>> {<0.4085.0>,poolboy_sup} had child riak_core_vnode_worker started with >>>> {riak_core_vnode_worker,start_link,undefined} at <0.4086.0> exit with >>>> reason bad argument in call to eleveldb:async_get(#Ref<0.0.2698.198008>, >>>> <<>>, >>>> <<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,101,116,...>>, >>>> []) in eleveldb:get/3 line 143 in context child_terminated >>>> >>>> On Wed, Jul 30, 2014 at 09:50:22AM +0100, Russell Brown wrote: >>>>> Hi Simon, >>>>> So the earlier “this is on wheezy, rest are on squeeze” thing is no >>>>> longer a factor? >>>>> >>>>> Any and all 2i repair you do ends with the same error? >>>>> >>>>> Cheers >>>>> >>>>> Russell >>>>> >>>>> On 30 Jul 2014, at 07:29, Effenberg, Simon <[email protected]> >>>>> wrote: >>>>> >>>>>> I tried it now with one partition on 6 different machines and everywhere >>>>>> the same result: index_scan_timeout and the info: bad argument in call >>>>>> to eleveldb:async_get (2x) or async_write (4x). >>>>>> >>>>>> >>>>>> Von Samsung Mobile gesendet >>>>>> >>>>>> >>>>>> -------- Ursprüngliche Nachricht -------- >>>>>> Von: "Effenberg, Simon" >>>>>> Datum:30.07.2014 07:49 (GMT+01:00) >>>>>> An: bryan hunt >>>>>> Cc: [email protected] >>>>>> Betreff: AW: repair-2i stops with "bad argument in call to >>>>>> eleveldb:async_write" >>>>>> >>>>>> Hi, >>>>>> >>>>>> I tried it on two different nodes with one partition each. Both multiple >>>>>> times before the upgrade and after the upgrade. >>>>>> >>>>>> I will try it on other machines in a minute but because I tried it >>>>>> already on two different nodes and one of them is 2 weeks old and stored >>>>>> on a HP 3par I bet that this is not a disk corruption issue.. >>>>>> >>>>>> Simon >>>>>> >>>>>> >>>>>> Von Samsung Mobile gesendet >>>>>> >>>>>> >>>>>> -------- Ursprüngliche Nachricht -------- >>>>>> Von: bryan hunt >>>>>> Datum:29.07.2014 18:21 (GMT+01:00) >>>>>> An: "Effenberg, Simon" >>>>>> Cc: [email protected] >>>>>> Betreff: Re: repair-2i stops with "bad argument in call to >>>>>> eleveldb:async_write" >>>>>> >>>>>> Hi Simon, >>>>>> >>>>>> Does the problem persist if you run it again? >>>>>> >>>>>> Does it happen if you run it against any other partition? >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Bryan >>>>>> >>>>>> >>>>>> >>>>>> Bryan Hunt - Client Services Engineer - Basho Technologies Limited - >>>>>> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431 >>>>>> >>>>>> On 29 Jul 2014, at 09:35, Effenberg, Simon <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> we have some issues with 2i queries like that: >>>>>>> >>>>>>> seffenberg@kriak46-1:~$ while :; do curl -s >>>>>>> localhost:8098/buckets/conversation/index/createdat_int/0/23182680 | >>>>>>> ruby -rjson -e "o = JSON.parse(STDIN.read); puts o['keys'].size"; sleep >>>>>>> 1; done >>>>>>> >>>>>>> 13853 >>>>>>> 13853 >>>>>>> 0 >>>>>>> 557 >>>>>>> 557 >>>>>>> 557 >>>>>>> 13853 >>>>>>> 0 >>>>>>> >>>>>>> >>>>>>> ... >>>>>>> >>>>>>> So I tried to start a repair-2i first on one vnode/partition on one node >>>>>>> (which is quiet new in the cluster.. 2 weeks or so). >>>>>>> >>>>>>> The command is failing with the following log entries: >>>>>>> >>>>>>> seffenberg@kriak46-7:~$ sudo riak-admin repair-2i >>>>>>> 22835963083295358096932575511191922182123945984 >>>>>>> Will repair 2i on these partitions: >>>>>>> 22835963083295358096932575511191922182123945984 >>>>>>> Watch the logs for 2i repair progress reports >>>>>>> seffenberg@kriak46-7:~$ 2014-07-29 08:20:22.729 UTC [info] >>>>>>> <0.5929.1061>@riak_kv_2i_aae:init:139 Starting 2i repair at speed 100 >>>>>>> for partitions [22835963083295358096932575511191922182123945984] >>>>>>> 2014-07-29 08:20:22.729 UTC [info] >>>>>>> <0.5930.1061>@riak_kv_2i_aae:repair_partition:257 Acquired lock on >>>>>>> partition 22835963083295358096932575511191922182123945984 >>>>>>> 2014-07-29 08:20:22.729 UTC [info] >>>>>>> <0.5930.1061>@riak_kv_2i_aae:repair_partition:259 Repairing indexes in >>>>>>> partition 22835963083295358096932575511191922182123945984 >>>>>>> 2014-07-29 08:20:22.740 UTC [info] >>>>>>> <0.5930.1061>@riak_kv_2i_aae:create_index_data_db:324 Creating >>>>>>> temporary database of 2i data in /var/lib/riak/anti_entropy/2i/tmp_db >>>>>>> 2014-07-29 08:20:22.751 UTC [info] >>>>>>> <0.5930.1061>@riak_kv_2i_aae:create_index_data_db:361 Grabbing all >>>>>>> index data for partition 22835963083295358096932575511191922182123945984 >>>>>>> 2014-07-29 08:25:22.752 UTC [info] >>>>>>> <0.5929.1061>@riak_kv_2i_aae:next_partition:160 Finished 2i repair: >>>>>>> Total partitions: 1 >>>>>>> Finished partitions: 1 >>>>>>> Speed: 100 >>>>>>> Total 2i items scanned: 0 >>>>>>> Total tree objects: 0 >>>>>>> Total objects fixed: 0 >>>>>>> With errors: >>>>>>> Partition: 22835963083295358096932575511191922182123945984 >>>>>>> Error: index_scan_timeout >>>>>>> >>>>>>> >>>>>>> 2014-07-29 08:25:22.752 UTC [error] <0.4711.1061> gen_server >>>>>>> <0.4711.1061> terminated with reason: bad argument in call to >>>>>>> eleveldb:async_write(#Ref<0.0.10120.211816>, <<>>, >>>>>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>>>>> []) in eleveldb:write/3 line 155 >>>>>>> 2014-07-29 08:25:22.753 UTC [error] <0.4711.1061> CRASH REPORT Process >>>>>>> <0.4711.1061> with 0 neighbours exited with reason: bad argument in >>>>>>> call to eleveldb:async_write(#Ref<0.0.10120.211816>, <<>>, >>>>>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>>>>> []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747 >>>>>>> 2014-07-29 08:25:22.753 UTC [error] <0.1031.0> Supervisor >>>>>>> {<0.1031.0>,poolboy_sup} had child riak_core_vnode_worker started with >>>>>>> {riak_core_vnode_worker,start_link,undefined} at <0.4711.1061> exit >>>>>>> with reason bad argument in call to >>>>>>> eleveldb:async_write(#Ref<0.0.10120.211816>, <<>>, >>>>>>> [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], >>>>>>> []) in eleveldb:write/3 line 155 in context child_terminated >>>>>>> >>>>>>> >>>>>>> Anything I can do about that? What's the issue here? >>>>>>> >>>>>>> I'm using Riak 1.4.8 (.deb package). >>>>>>> >>>>>>> Cheers >>>>>>> Simon >>>>>>> _______________________________________________ >>>>>>> riak-users mailing list >>>>>>> [email protected] >>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>> >>>>>> _______________________________________________ >>>>>> riak-users mailing list >>>>>> [email protected] >>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>> >>>> >>>> -- >>>> Simon Effenberg | Site Op | mobile.international GmbH >>>> >>>> Phone: + 49. 30. 8109. 7173 >>>> M-Phone: + 49. 151. 5266. 1558 >>>> Mail: [email protected] >>>> Web: www.mobile.de >>>> >>>> Marktplatz 1 | 14532 Europarc Dreilinden | Germany >>>> >>>> ______________________________________________________ >>>> Geschäftsführer: Malte Krüger >>>> HRB Nr.: 18517 P, Amtsgericht Potsdam >>>> Sitz der Gesellschaft: Kleinmachnow >>>> ______________________________________________________ >>>> _______________________________________________ >>>> riak-users mailing list >>>> [email protected] >>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >> >> -- >> Simon Effenberg | Site Op | mobile.international GmbH >> >> Phone: + 49. 30. 8109. 7173 >> M-Phone: + 49. 151. 5266. 1558 >> Mail: [email protected] >> Web: www.mobile.de >> >> Marktplatz 1 | 14532 Europarc Dreilinden | Germany >> >> ______________________________________________________ >> Geschäftsführer: Malte Krüger >> HRB Nr.: 18517 P, Amtsgericht Potsdam >> Sitz der Gesellschaft: Kleinmachnow >> ______________________________________________________ >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > -- > Simon Effenberg | Site Op | mobile.international GmbH > > Phone: + 49. 30. 8109. 7173 > M-Phone: + 49. 151. 5266. 1558 > Mail: [email protected] > Web: www.mobile.de > > Marktplatz 1 | 14532 Europarc Dreilinden | Germany > > ______________________________________________________ > Geschäftsführer: Malte Krüger > HRB Nr.: 18517 P, Amtsgericht Potsdam > Sitz der Gesellschaft: Kleinmachnow > ______________________________________________________ > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
