The repair worked.. Total 2 items scanned: 12961170 Total tree objects: 9258332 Total objects fixed: 538423
So it seems that it had to fix a lot. But shouldn't this be impossible with AAE enabled? Von Samsung Mobile gesendet -------- Ursprüngliche Nachricht -------- Von: "Effenberg, Simon" Datum:03.09.2014 16:28 (GMT+01:00) An: bryan hunt Cc: [email protected] Betreff: Re: repair-2i stops with "bad argument in call to eleveldb:async_write" I changed now in the code the timeout from 5 to 60mins on one node.. after 23mins the repair-2i was able to continue: 2014-09-03 12:58:06.913 UTC [info] <0.10345.8>@riak_kv_2i_aae:repair_partition:257 Acquired lock on partition 548063113999088594326381812268606132370974703616 2014-09-03 12:58:06.913 UTC [info] <0.10345.8>@riak_kv_2i_aae:repair_partition:259 Repairing indexes in partition 548063113999088594326381812268606132370974703616 2014-09-03 12:58:06.924 UTC [info] <0.10345.8>@riak_kv_2i_aae:create_index_data_db:324 Creating temporary database of 2i data in /var/lib/riak/anti_entropy/2i/tmp_db 2014-09-03 12:58:06.928 UTC [info] <0.10345.8>@riak_kv_2i_aae:create_index_data_db:361 Grabbing all index data for partition 548063113999088594326381812268606132370974703616 2014-09-03 13:25:23.946 UTC [info] <0.10345.8>@riak_kv_2i_aae:create_index_data_db:375 Grabbed 12961170 index data entries from partition 548063113999088594326381812268606132370974703616 2014-09-03 13:25:23.946 UTC [info] <0.10345.8>@riak_kv_2i_aae:build_tmp_tree:448 Building tree for 2i data on disk for partition 548063113999088594326381812268606132370974703616 2014-09-03 13:29:13.478 UTC [info] <0.10345.8>@riak_kv_2i_aae:build_tmp_tree:478 Done building temporary tree for 2i data with 9258332 entries 2014-09-03 13:29:13.478 UTC [info] <0.10345.8>@riak_kv_2i_aae:do_exchange:496 Reconciling 2i data ..... (still running) Maybe this works.. maybe it will break.. I'm looking forward. Cheers Simon On Mon, Aug 11, 2014 at 08:24:44AM +0000, Effenberg, Simon wrote: > Hi, > > any updates on this issue? I'm still able to search a range of 2i and > I'm getting 3 results.. 0, 557 and 13853 :(.. > > I cannot rely on 2i right now nor can I repair it. > > Cheers > Simon > > On Fri, Aug 08, 2014 at 07:12:58AM +0000, Effenberg, Simon wrote: > > Hi Bryan, > > > > thanks for this. I tried it but to be honest I cannot see any specific > > stuff in the logs (on the specific host). > > > > I attached the logfile from the specific node. If you think it is > > also/more important to look into the logfiles on the other nodes I can > > send them as well.. but a quick look into all of them (searching for > > "2i" and "index") didn't show anything unusual.. the only stuff was > > > > 2014-08-07 05:44:11.298 UTC [debug] > > <0.969.0>@riak_kv_index_hashtree:handle_call:240 Updating tree: > > (vnode)=633697975561446187189878970435575840553939501056 > > (preflist)={610862012478150829092946394924383918371815555072,12} > > > > and searching for errors didn't show more than you see in the attached > > files: > > > > $ for host in kriak46-{1..7} kriak47-{1..6}; do echo $host; ssh $host "grep > > '^2014-08-07 05' /var/log/riak/console.log | grep -i error" ; done > > kriak46-1 > > 2014-08-07 05:38:28.596 UTC [error] <0.8949.566> ** Node > > '[email protected]' not responding ** > > 2014-08-07 05:42:36.197 UTC [error] <0.24823.566> ** Node > > '[email protected]' not responding ** > > 2014-08-07 05:43:16.213 UTC [error] <0.26434.566> ** Node > > '[email protected]' not responding ** > > 2014-08-07 05:48:14.284 UTC [error] <0.1697.0> gen_server <0.1697.0> > > terminated with reason: bad argument in call to > > eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, > > [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], > > []) in eleveldb:write/3 line 155 > > 2014-08-07 05:48:14.284 UTC [error] <0.1697.0> CRASH REPORT Process > > <0.1697.0> with 0 neighbours exited with reason: bad argument in call to > > eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, > > [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], > > []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747 > > 2014-08-07 05:48:14.284 UTC [error] <0.1692.0> Supervisor > > {<0.1692.0>,poolboy_sup} had child riak_core_vnode_worker started with > > {riak_core_vnode_worker,start_link,undefined} at <0.1697.0> exit with > > reason bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, > > <<>>, > > [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], > > []) in eleveldb:write/3 line 155 in context child_terminated > > 2014-08-i7 05:50:11.390 UTC [error] <0.20983.567> ** Node > > '[email protected]' not responding ** > > kriak46-2 > > kriak46-3 > > kriak46-4 > > kriak46-5 > > kriak46-6 > > kriak46-7 > > kriak47-1 > > kriak47-2 > > kriak47-3 > > kriak47-4 > > kriak47-5 > > kriak47-6 > > > > You mentioned the partition repair stuff.. do you think I need to try > > out the full repair? Is this maybe a way to fix it? Because it is quiet > > hard to do this on the cluster (~15 TB of data with AAE stuff and > > tombstones and maybe ~10 TB without tombstones and AAE stuff) and I > > don't want to start doing this if it won't help. > > > > Cheers > > Simon > > > > On Wed, Aug 06, 2014 at 01:08:36PM +0100, bryan hunt wrote: > > > Simon, > > > > > > If you want to get more verbose logging information, you could perform > > > the following to change the logging level, to debug, then run > > > `repair-2i`, and finally switching back to the normal logging level. > > > > > > - `riak attach` > > > - `(riak@nodename)1> SetDebug = fun() -> {node(), > > > lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", > > > debug)} end.` > > > - `(riak@nodename)2> rp(rpc:multicall(erlang, apply, [SetDebug,[]])).` > > > (don't forget the period at the end of these statements) > > > - Hit CTRL+C twice to quit from the node > > > > > > You can then revert back to the normal `info` logging level by running > > > the following command via `riak attach`: > > > > > > - `riak attach` > > > - `(riak@nodename)1> SetInfo = fun() -> {node(), > > > lager:set_loglevel(lager_file_backend, "/var/log/riak/console.log", > > > info)} end.` > > > - `(riak@nodename)2> rp(rpc:multicall(erlang, apply, [SetInfo,[]])).` > > > (don't forget the period at the end of these statements) > > > - Hit CTRL+C twice to quit from a the node > > > > > > Please also see the docs for info on `riak attach` monitoring of repairs. > > > > > > http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Monitoring-Repairs > > > > > > Repairs can also be monitored using the `riak-admin transfers` command. > > > > > > http://docs.basho.com/riak/1.4.9/ops/running/recovery/repairing-partitions/#Running-a-Repair > > > > > > Best Regards, > > > > > > Bryan Hunt > > > > > > Bryan Hunt - Client Services Engineer - Basho Technologies Limited - > > > Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431 > > > > ... > > Total partitions: 1 > > Finished partitions: 1 > > Speed: 100 > > Total 2i items scanned: 0 > > Total tree objects: 0 > > Total objects fixed: 0 > > With errors: > > Partition: 319703483166135013357056057156686910549735243776 > > Error: index_scan_timeout > > > > > > 2014-08-07 05:48:14.284 UTC [error] <0.1697.0> CRASH REPORT Process > > <0.1697.0> with 0 neighbours exited with reason: bad argument in call to > > eleveldb:async_write(#Ref<0.0.567.170046>, <<>>, > > [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], > > []) in eleveldb:write/3 line 155 in gen_server:terminate/6 line 747 > > 2014-08-07 05:48:14.284 UTC [error] <0.1692.0> Supervisor > > {<0.1692.0>,poolboy_sup} had child riak_core_vnode_worker started with > > {riak_core_vnode_worker,start_link,undefined} at <0.1697.0> exit with > > reason bad argument in call to eleveldb:async_write(#Ref<0.0.567.170046>, > > <<>>, > > [{put,<<131,104,2,109,0,0,0,20,99,111,110,118,101,114,115,97,116,105,111,110,95,115,101,99,114,...>>,...}], > > []) in eleveldb:write/3 line 155 in context child_terminated > > > _______________________________________________ > > riak-users mailing list > > [email protected] > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > -- > Simon Effenberg | Site Op | mobile.international GmbH > > Phone: + 49. 30. 8109. 7173 > M-Phone: + 49. 151. 5266. 1558 > Mail: [email protected] > Web: www.mobile.de<http://www.mobile.de> > > Marktplatz 1 | 14532 Europarc Dreilinden | Germany > > ______________________________________________________ > Geschäftsführer: Malte Krüger > HRB Nr.: 18517 P, Amtsgericht Potsdam > Sitz der Gesellschaft: Kleinmachnow > ______________________________________________________ > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Simon Effenberg | Site Op | mobile.international GmbH Phone: + 49. 30. 8109. 7173 M-Phone: + 49. 151. 5266. 1558 Mail: [email protected] Web: www.mobile.de<http://www.mobile.de> Marktplatz 1 | 14532 Europarc Dreilinden | Germany ______________________________________________________ Geschäftsführer: Malte Krüger HRB Nr.: 18517 P, Amtsgericht Potsdam Sitz der Gesellschaft: Kleinmachnow ______________________________________________________ _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
