riak key sharding

2013-12-10 Thread Georgio Pandarez
Hi,

I have noticed that Riak-CS can shard (that is split) large keys
automatically across nodes. I would like to achieve a similar outcome with
Riak itself. Is there any best practice to achieve this? Could a portion of
Riak-CS be used or should I just bite the bullet and use Riak-CS?

Latency is key for my application and I wanted to avoid the additional
layer Riak-CS provides.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Upgrade from 1.3.1 to 1.4.2 = high IO

2013-12-10 Thread Simon Effenberg
Hi @list,

I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
upgrading the first node (out of 12) this node seems to do many merges.
the sst_* directories changes in size rapidly and the node is having
a disk utilization of 100% all the time.

I know that there is something like that:

The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
will initiate an automatic conversion that could pause the startup of
each node by 3 to 7 minutes. The leveldb data in level #1 is being
adjusted such that level #1 can operate as an overlapped data level
instead of as a sorted data level. The conversion is simply the
reduction of the number of files in level #1 to being less than eight
via normal compaction of data from level #1 into level #2. This is
a one time conversion.

but it looks much more invasive than explained here or doesn't have to
do anything with the (probably seen) merges.

Is this normal behavior or could I do anything about it?

At the moment I'm stucked with the upgrade procedure because this high
IO load would probably lead to high response times.

Also we have a lot of data (per node ~950 GB).

Cheers
Simon

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Upgrade from 1.3.1 to 1.4.2 = high IO

2013-12-10 Thread Simon Effenberg
Hi Matthew,

see inline..

On Tue, 10 Dec 2013 10:38:03 -0500
Matthew Von-Maszewski matth...@basho.com wrote:

 The sad truth is that you are not the first to see this problem.  And yes, it 
 has to do with your 950GB per node dataset.  And no, nothing to do but sit 
 through it at this time.
 
 While I did extensive testing around upgrade times before shipping 1.4, 
 apparently there are data configurations I did not anticipate.  You are 
 likely seeing a cascade where a shift of one file from level-1 to level-2 is 
 causing a shift of another file from level-2 to level-3, which causes a 
 level-3 file to shift to level-4, etc … then the next file shifts from 
 level-1.
 
 The bright side of this pain is that you will end up with better write 
 throughput once all the compaction ends.

I have to deal with that.. but my problem is now, if I'm doing this
node by node it looks like 2i searches aren't possible while 1.3 and
1.4 nodes exists in the cluster. Is there any problem which leads me to
an 2i repair marathon or could I easily wait for some hours for each
node until all merges are done before I upgrade the next one? (2i
searches can fail for some time.. the APP isn't having problems with
that but are new inserts with 2i indices processed successfully or do
I have to do the 2i repair?)

/s

one other good think: saving disk space is one advantage ;)..


 
 Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but that 
 is not going to help you today.
 
 Matthew
 
 On Dec 10, 2013, at 10:26 AM, Simon Effenberg seffenb...@team.mobile.de 
 wrote:
 
  Hi @list,
  
  I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
  upgrading the first node (out of 12) this node seems to do many merges.
  the sst_* directories changes in size rapidly and the node is having
  a disk utilization of 100% all the time.
  
  I know that there is something like that:
  
  The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
  will initiate an automatic conversion that could pause the startup of
  each node by 3 to 7 minutes. The leveldb data in level #1 is being
  adjusted such that level #1 can operate as an overlapped data level
  instead of as a sorted data level. The conversion is simply the
  reduction of the number of files in level #1 to being less than eight
  via normal compaction of data from level #1 into level #2. This is
  a one time conversion.
  
  but it looks much more invasive than explained here or doesn't have to
  do anything with the (probably seen) merges.
  
  Is this normal behavior or could I do anything about it?
  
  At the moment I'm stucked with the upgrade procedure because this high
  IO load would probably lead to high response times.
  
  Also we have a lot of data (per node ~950 GB).
  
  Cheers
  Simon
  
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
 


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon: + 49-(0)30-8109 - 7173
Fax: + 49-(0)30-8109 - 7131

Mail: seffenb...@team.mobile.de
Web:www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Stalled handoffs on a prod cluster after server crash

2013-12-10 Thread Simon Effenberg
I had something like that once but with version 1.2 or 1.3 .. a rolling
restart helped in my case.

/s

On Mon, 9 Dec 2013 09:48:12 -0500
Ivaylo Panitchkov ipanitch...@hibernum.com wrote:

 Hello,
 
 We have a prod cluster of four machines running riak (1.1.4 2012-06-19)
 Debian x86_64.
 Two days ago one of the servers went down because of a hardware failure.
 I force-removed the machine in question to re-balance the cluster before
 adding the new machine.
 Since then the cluster is operating properly, but I noticed some handoffs
 are stalled now.
 I had similar situation awhile ago that was solved by simply forcing the
 handoffs, but this time the same approach didn't work.
 Any ideas, solutions or just hints are greatly appreciated.
 Below are cluster statuses. Replaced the IP addresses for security reason.
 
 
 
 ~# riak-admin member_status
 Attempting to restart script through sudo -u riak
 = Membership
 ==
 Status RingPendingNode
 ---
 valid  45.3% 34.4%'r...@aaa.aaa.aaa.aaa'
 valid  26.6% 32.8%'r...@bbb.bbb.bbb.bbb'
 valid  28.1% 32.8%'r...@ccc.ccc.ccc.ccc'
 ---
 Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
 
 
 
 ~# riak-admin ring_status
 Attempting to restart script through sudo -u riak
 == Claimant
 ===
 Claimant:  'r...@aaa.aaa.aaa.aaa'
 Status: up
 Ring Ready: true
 
 == Ownership Handoff
 ==
 Owner:  r...@aaa.aaa.aaa.aaa
 Next Owner: r...@bbb.bbb.bbb.bbb
 
 Index: 22835963083295358096932575511191922182123945984
   Waiting on: [riak_kv_vnode]
   Complete:   [riak_pipe_vnode]
 
 Index: 570899077082383952423314387779798054553098649600
   Waiting on: [riak_kv_vnode]
   Complete:   [riak_pipe_vnode]
 
 Index: 1118962191081472546749696200048404186924073353216
   Waiting on: [riak_kv_vnode]
   Complete:   [riak_pipe_vnode]
 
 Index: 1392993748081016843912887106182707253109560705024
   Waiting on: [riak_kv_vnode]
   Complete:   [riak_pipe_vnode]
 
 ---
 Owner:  r...@aaa.aaa.aaa.aaa
 Next Owner: r...@ccc.ccc.ccc.ccc
 
 Index: 114179815416476790484662877555959610910619729920
   Waiting on: [riak_kv_vnode]
   Complete:   [riak_pipe_vnode]
 
 Index: 662242929415565384811044689824565743281594433536
   Waiting on: [riak_kv_vnode]
   Complete:   [riak_pipe_vnode]
 
 Index: 1210306043414653979137426502093171875652569137152
   Waiting on: [riak_kv_vnode]
   Complete:   [riak_pipe_vnode]
 
 ---
 
 == Unreachable Nodes
 ==
 All nodes are up and reachable
 
 
 
 Thanks in advance,
 Ivaylo
 
 
 
 -- 
 Ivaylo Panitchkov
 Software developer
 Hibernum Creations Inc.
 
 Ce courriel est confidentiel et peut aussi être protégé par la loi.Si vous
 avez reçu ce courriel par erreur, veuillez nous en aviser immédiatement en
 y répondant, puis supprimer ce message de votre système. Veuillez ne pas le
 copier, l’utiliser pour quelque raison que ce soit ni divulguer son contenu
 à quiconque.
 This email is confidential and may also be legally privileged. If you have
 received this email in error, please notify us immediately by reply email
 and then delete this message from your system. Please do not copy it or use
 it for any purpose or disclose its content.


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon: + 49-(0)30-8109 - 7173
Fax: + 49-(0)30-8109 - 7131

Mail: seffenb...@team.mobile.de
Web:www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Upgrade from 1.3.1 to 1.4.2 = high IO

2013-12-10 Thread Matthew Von-Maszewski
The sad truth is that you are not the first to see this problem.  And yes, it 
has to do with your 950GB per node dataset.  And no, nothing to do but sit 
through it at this time.

While I did extensive testing around upgrade times before shipping 1.4, 
apparently there are data configurations I did not anticipate.  You are likely 
seeing a cascade where a shift of one file from level-1 to level-2 is causing a 
shift of another file from level-2 to level-3, which causes a level-3 file to 
shift to level-4, etc … then the next file shifts from level-1.

The bright side of this pain is that you will end up with better write 
throughput once all the compaction ends.

Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but that is 
not going to help you today.

Matthew

On Dec 10, 2013, at 10:26 AM, Simon Effenberg seffenb...@team.mobile.de wrote:

 Hi @list,
 
 I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
 upgrading the first node (out of 12) this node seems to do many merges.
 the sst_* directories changes in size rapidly and the node is having
 a disk utilization of 100% all the time.
 
 I know that there is something like that:
 
 The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
 will initiate an automatic conversion that could pause the startup of
 each node by 3 to 7 minutes. The leveldb data in level #1 is being
 adjusted such that level #1 can operate as an overlapped data level
 instead of as a sorted data level. The conversion is simply the
 reduction of the number of files in level #1 to being less than eight
 via normal compaction of data from level #1 into level #2. This is
 a one time conversion.
 
 but it looks much more invasive than explained here or doesn't have to
 do anything with the (probably seen) merges.
 
 Is this normal behavior or could I do anything about it?
 
 At the moment I'm stucked with the upgrade procedure because this high
 IO load would probably lead to high response times.
 
 Also we have a lot of data (per node ~950 GB).
 
 Cheers
 Simon
 
 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Upgrade from 1.3.1 to 1.4.2 = high IO

2013-12-10 Thread Matthew Von-Maszewski
2i is not my expertise, so I had to discuss you concerns with another Basho 
developer.  He says:

Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk format.  
You must wait for all nodes to update if you desire to use the new 2i query.  
The 2i data will properly write/update on both 1.3 and 1.4 machines during the 
migration.

Does that answer your question?


And yes, you might see available disk space increase during the upgrade 
compactions if your dataset contains numerous delete tombstones.  The Riak 
2.0 code includes a new feature called aggressive delete for leveldb.  This 
feature is more proactive in pushing delete tombstones through the levels to 
free up disk space much more quickly (especially if you perform block deletes 
every now and then).

Matthew


On Dec 10, 2013, at 10:44 AM, Simon Effenberg seffenb...@team.mobile.de wrote:

 Hi Matthew,
 
 see inline..
 
 On Tue, 10 Dec 2013 10:38:03 -0500
 Matthew Von-Maszewski matth...@basho.com wrote:
 
 The sad truth is that you are not the first to see this problem.  And yes, 
 it has to do with your 950GB per node dataset.  And no, nothing to do but 
 sit through it at this time.
 
 While I did extensive testing around upgrade times before shipping 1.4, 
 apparently there are data configurations I did not anticipate.  You are 
 likely seeing a cascade where a shift of one file from level-1 to level-2 is 
 causing a shift of another file from level-2 to level-3, which causes a 
 level-3 file to shift to level-4, etc … then the next file shifts from 
 level-1.
 
 The bright side of this pain is that you will end up with better write 
 throughput once all the compaction ends.
 
 I have to deal with that.. but my problem is now, if I'm doing this
 node by node it looks like 2i searches aren't possible while 1.3 and
 1.4 nodes exists in the cluster. Is there any problem which leads me to
 an 2i repair marathon or could I easily wait for some hours for each
 node until all merges are done before I upgrade the next one? (2i
 searches can fail for some time.. the APP isn't having problems with
 that but are new inserts with 2i indices processed successfully or do
 I have to do the 2i repair?)
 
 /s
 
 one other good think: saving disk space is one advantage ;)..
 
 
 
 Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but that 
 is not going to help you today.
 
 Matthew
 
 On Dec 10, 2013, at 10:26 AM, Simon Effenberg seffenb...@team.mobile.de 
 wrote:
 
 Hi @list,
 
 I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
 upgrading the first node (out of 12) this node seems to do many merges.
 the sst_* directories changes in size rapidly and the node is having
 a disk utilization of 100% all the time.
 
 I know that there is something like that:
 
 The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
 will initiate an automatic conversion that could pause the startup of
 each node by 3 to 7 minutes. The leveldb data in level #1 is being
 adjusted such that level #1 can operate as an overlapped data level
 instead of as a sorted data level. The conversion is simply the
 reduction of the number of files in level #1 to being less than eight
 via normal compaction of data from level #1 into level #2. This is
 a one time conversion.
 
 but it looks much more invasive than explained here or doesn't have to
 do anything with the (probably seen) merges.
 
 Is this normal behavior or could I do anything about it?
 
 At the moment I'm stucked with the upgrade procedure because this high
 IO load would probably lead to high response times.
 
 Also we have a lot of data (per node ~950 GB).
 
 Cheers
 Simon
 
 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
 
 
 
 -- 
 Simon Effenberg | Site Ops Engineer | mobile.international GmbH
 Fon: + 49-(0)30-8109 - 7173
 Fax: + 49-(0)30-8109 - 7131
 
 Mail: seffenb...@team.mobile.de
 Web:www.mobile.de
 
 Marktplatz 1 | 14532 Europarc Dreilinden | Germany
 
 
 Geschäftsführer: Malte Krüger
 HRB Nr.: 18517 P, Amtsgericht Potsdam
 Sitz der Gesellschaft: Kleinmachnow 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Upgrade from 1.3.1 to 1.4.2 = high IO

2013-12-10 Thread Simon Effenberg
Hi Matthew,

thanks!.. that answers my questions!

Cheers
Simon

On Tue, 10 Dec 2013 11:08:32 -0500
Matthew Von-Maszewski matth...@basho.com wrote:

 2i is not my expertise, so I had to discuss you concerns with another Basho 
 developer.  He says:
 
 Between 1.3 and 1.4, the 2i query did change but not the 2i on-disk format.  
 You must wait for all nodes to update if you desire to use the new 2i query.  
 The 2i data will properly write/update on both 1.3 and 1.4 machines during 
 the migration.
 
 Does that answer your question?
 
 
 And yes, you might see available disk space increase during the upgrade 
 compactions if your dataset contains numerous delete tombstones.  The Riak 
 2.0 code includes a new feature called aggressive delete for leveldb.  This 
 feature is more proactive in pushing delete tombstones through the levels to 
 free up disk space much more quickly (especially if you perform block deletes 
 every now and then).
 
 Matthew
 
 
 On Dec 10, 2013, at 10:44 AM, Simon Effenberg seffenb...@team.mobile.de 
 wrote:
 
  Hi Matthew,
  
  see inline..
  
  On Tue, 10 Dec 2013 10:38:03 -0500
  Matthew Von-Maszewski matth...@basho.com wrote:
  
  The sad truth is that you are not the first to see this problem.  And yes, 
  it has to do with your 950GB per node dataset.  And no, nothing to do but 
  sit through it at this time.
  
  While I did extensive testing around upgrade times before shipping 1.4, 
  apparently there are data configurations I did not anticipate.  You are 
  likely seeing a cascade where a shift of one file from level-1 to level-2 
  is causing a shift of another file from level-2 to level-3, which causes a 
  level-3 file to shift to level-4, etc … then the next file shifts from 
  level-1.
  
  The bright side of this pain is that you will end up with better write 
  throughput once all the compaction ends.
  
  I have to deal with that.. but my problem is now, if I'm doing this
  node by node it looks like 2i searches aren't possible while 1.3 and
  1.4 nodes exists in the cluster. Is there any problem which leads me to
  an 2i repair marathon or could I easily wait for some hours for each
  node until all merges are done before I upgrade the next one? (2i
  searches can fail for some time.. the APP isn't having problems with
  that but are new inserts with 2i indices processed successfully or do
  I have to do the 2i repair?)
  
  /s
  
  one other good think: saving disk space is one advantage ;)..
  
  
  
  Riak 2.0's leveldb has code to prevent/reduce compaction cascades, but 
  that is not going to help you today.
  
  Matthew
  
  On Dec 10, 2013, at 10:26 AM, Simon Effenberg seffenb...@team.mobile.de 
  wrote:
  
  Hi @list,
  
  I'm trying to upgrade our Riak cluster from 1.3.1 to 1.4.2 .. after
  upgrading the first node (out of 12) this node seems to do many merges.
  the sst_* directories changes in size rapidly and the node is having
  a disk utilization of 100% all the time.
  
  I know that there is something like that:
  
  The first execution of 1.4.0 leveldb using a 1.3.x or 1.2.x dataset
  will initiate an automatic conversion that could pause the startup of
  each node by 3 to 7 minutes. The leveldb data in level #1 is being
  adjusted such that level #1 can operate as an overlapped data level
  instead of as a sorted data level. The conversion is simply the
  reduction of the number of files in level #1 to being less than eight
  via normal compaction of data from level #1 into level #2. This is
  a one time conversion.
  
  but it looks much more invasive than explained here or doesn't have to
  do anything with the (probably seen) merges.
  
  Is this normal behavior or could I do anything about it?
  
  At the moment I'm stucked with the upgrade procedure because this high
  IO load would probably lead to high response times.
  
  Also we have a lot of data (per node ~950 GB).
  
  Cheers
  Simon
  
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
  
  
  
  -- 
  Simon Effenberg | Site Ops Engineer | mobile.international GmbH
  Fon: + 49-(0)30-8109 - 7173
  Fax: + 49-(0)30-8109 - 7131
  
  Mail: seffenb...@team.mobile.de
  Web:www.mobile.de
  
  Marktplatz 1 | 14532 Europarc Dreilinden | Germany
  
  
  Geschäftsführer: Malte Krüger
  HRB Nr.: 18517 P, Amtsgericht Potsdam
  Sitz der Gesellschaft: Kleinmachnow 
 


-- 
Simon Effenberg | Site Ops Engineer | mobile.international GmbH
Fon: + 49-(0)30-8109 - 7173
Fax: + 49-(0)30-8109 - 7131

Mail: seffenb...@team.mobile.de
Web:www.mobile.de

Marktplatz 1 | 14532 Europarc Dreilinden | Germany


Geschäftsführer: Malte Krüger
HRB Nr.: 18517 P, Amtsgericht Potsdam
Sitz der Gesellschaft: Kleinmachnow 

___
riak-users mailing list
riak-users@lists.basho.com

2i stopped working on LevelDB with multi backend

2013-12-10 Thread Chris Read
We just rebuilt our test environment (something we do every month) and
suddenly we get the following error when trying to use 2i:

{error,{error,{indexes_not_supported,riak_kv_multi_backend}}}

But looking at the properties of the bucket it's set to use leveldb:

# curl -k https://localhost:8069/riak/eleveldb/ | jq .
{
  props: {
young_vclock: 20,
w: quorum,
small_vclock: 50,
rw: quorum,
r: quorum,
linkfun: {
  fun: mapreduce_linkfun,
  mod: riak_kv_wm_link_walker
},
last_write_wins: false,
dw: quorum,
chash_keyfun: {
  fun: chash_std_keyfun,
  mod: riak_core_util
},
big_vclock: 50,
basic_quorum: false,
backend: eleveldb_data,
allow_mult: false,
n_val: 3,
name: eleveldb,
notfound_ok: true,
old_vclock: 86400,
postcommit: [],
pr: 0,
precommit: [],
pw: 0
  }
}

Here's the relevant app.config snippet:

{storage_backend, riak_kv_multi_backend},
{multi_backend_default, bitcask_data},
{multi_backend, [
  {bitcask_data, riak_kv_bitcask_backend,
[
 {data_root, /srv/riak/data/bitcask/data},
 %%{io_mode, nif},
 {max_file_size, 2147483648}, %% 2G
 {merge_window, always},
 {frag_merge_trigger, 30},  %% Merge
at 30% dead keys
 {dead_bytes_merge_trigger, 134217728}, %% Merge
files that have more than 128MB dead
 {frag_threshold, 25},  %% Files
that have 25% dead keys will be merged too
 {dead_bytes_threshold, 67108864},  %% Include
files that have 64MB of dead space in merges
 {small_file_threshold, 10485760},  %% Files
smaller than 100MB will not be merged
 {log_needs_merge, true},   %% Log
when we need to merge...
 {sync_strategy, none}
]

  },
  {eleveldb_data, riak_kv_eleveldb_backend,
[{data_root, /srv/riak/data/eleveldb/files},
 {write_buffer_size_min, 31457280 }, %% 30 MB in bytes
 {write_buffer_size_max, 62914560}, %% 60 MB in bytes
 {max_open_files, 20}, %% Maximum number of files open
at once per partition
 {sst_block_size, 4096}, %% 4K blocks
 {cache_size, 8388608} %% 8MB default cache size per-partition
]

  }
]},

Anyone have any ideas?

We're using Ubuntu 12.04 with the Basho Riak 1.4.2 .deb. The only
change to this environment has been to upgrade the kernel from
3.5.0-26 to 3.8.0-31-generic but I'd be very surprised if that broke
2i...

Thanks,

Chris

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak nagios script

2013-12-10 Thread Hector Castro
Hello Kathleen,

Have you executed the `make encrypt` target to build the `check_node`
binary? [0] From there, I copied it to the Riak node and invoked it
like this:

$ /usr/lib/riak/erts-5.9.1/bin/escript check_node --node
riak@127.0.0.1 riak_kv_up
OKAY: riak_kv is running on riak@127.0.0.1

I used the entire path to escript because the bin directory under erts
was not in my PATH by default.

--
Hector

[0] https://github.com/basho/riak_nagios#building

On Mon, Dec 9, 2013 at 7:35 PM, kzhang kzh...@wayfair.com wrote:
 Also, when running

 https://github.com/basho/riak_nagios/blob/master/src/check_node.erl

 I ran into the error:

 ** exception error: undefined function getopt:parse/2
  in function  check_node:main/2 (check_node.erl, line 15)




 --
 View this message in context: 
 http://riak-users.197444.n3.nabble.com/riak-nagios-script-tp4030025p4030026.html
 Sent from the Riak Users mailing list archive at Nabble.com.

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak nagios script

2013-12-10 Thread kzhang
Thanks Hector.

Here is how I executed the script.

 I downloaded and installed the erlang shell from 
http://www.erlang.org/documentation/doc-5.3/doc/getting_started/getting_started.html

started erlang OTP:

root@MYRIAKNODE otp_src_R16B02]# erl -s toolbar
Erlang R16B02 (erts-5.10.3) [source] [64-bit] [async-threads:10] [hipe]
[kernel-poll:false]

Eshell V5.10.3  (abort with ^G)

grabbed the source code
(https://github.com/basho/riak_nagios/blob/master/src/check_node.erl),
compiled it: 
c(check_node).

ran it:
check_node:main([{node, 'xx.xx.xx.xx'}]).   

then got:

** exception error: undefined function getopt:parse/2
 in function  check_node:main/2 (check_node.erl, line 15)

Here is where I am. I found this:

https://github.com/jcomellas/getopt

I grabbed the source code, compiled it under otp_src_R16B02.

ran it again:
2 check_node:main([{node, 'xx.xx.xx.xx'}]).
UNKNOWN: invalid_option_arg {check,{node,'xx.xx.xx.xx'}}

Am I on the right path?

Thanks,

Kathleen












--
View this message in context: 
http://riak-users.197444.n3.nabble.com/riak-nagios-script-tp4030025p4030037.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Recap for December 4 - 9

2013-12-10 Thread Mark Phillips
Morning, Afternoon, Evening to All -

Here's today's Recap. Enjoy.

Also, if you're around Raleigh/Durham and want to have drinks next
week, let me know.

Mark
twitter.com/pharkmillups
---

Riak Recap for December 4 - 9
==

The recording of last Friday's Riak Community Hangout is now
available. This one is all about Riak Security and the exciting
history behind allow_mult=false. It's well worth your time.
- http://www.youtube.com/watch?v=n8m8xlizekg

John Daily et al. are talking about Riak 2.0 tomorrow night at the
Braintree offices in Chicago. This is not to be missed.
- www.meetup.com/Chicago-Riak-Meetup/events/151516252/

Tom Santero and I will be at the West End Ruby Meetup next week in
Durham, NC to talk about Riak.
- http://www.meetup.com/raleighrb/events/154001722/

Riakpbc, nlf's Node.js protocol buffers client for Riak hit version
1.0.5. (Also, h/t to nlf for cranking out bug fixes.)
- https://npmjs.org/package/riakpbc

Riaks, Noah Isaacson's Rak client, just hit 2.0.2
- https://npmjs.org/package/riaks

We wrote up some details on how the team at CityMaps is using Riak in
production.
- 
https://basho.com/social-map-innovator-and-riak-user-citymaps-predicts-where-you-want-to-go/

Vincent Chinedu Okonkwo open sourced a Lager backend for Mozilla’s Heka.
- https://github.com/codmajik/herlka

Vic Iglesias wrote a great post about getting Riak CS and Eucalyptus
running together.
-  
http://testingclouds.wordpress.com/2013/12/10/testing-riak-cs-with-eucalyptus/

Q  A
- http://stackoverflow.com/questions/20366695/truncate-a-riak-database
- http://stackoverflow.com/questions/20440450/riak-databse-and-spring-mvc
- 
http://stackoverflow.com/questions/20461280/are-single-client-writes-strongly-ordered-in-dynamodb-or-riak

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak nagios script

2013-12-10 Thread Alex Moore
Hi Kathleen,

If you’d like to run riak_nagios from the erl command line, you’ll need to 
compile everything in src and include it in the path along with the getopt 
library.   

You can compile everything with a simple call to make, and then include it in 
the path with erl -pa deps/*/ebin ebin”.  
Once everything is loaded, you can call check_node:main([--node, 
dev1@127.0.0.1, riak_kv_up]).” or something similar to run it.  The last 
parameter in the Args array will be the check to make.  

Is there a reason you’re running it this way instead of compiling it to an 
escript and running it from bash? 

Thanks,
Alex Moore

On December 10, 2013 at 1:26:20 PM, kzhang (kzh...@wayfair.com) wrote:

Thanks Hector.  

Here is how I executed the script.  

I downloaded and installed the erlang shell from  
http://www.erlang.org/documentation/doc-5.3/doc/getting_started/getting_started.html
  

started erlang OTP:  

root@MYRIAKNODE otp_src_R16B02]# erl -s toolbar  
Erlang R16B02 (erts-5.10.3) [source] [64-bit] [async-threads:10] [hipe]  
[kernel-poll:false]  

Eshell V5.10.3 (abort with ^G)  

grabbed the source code  
(https://github.com/basho/riak_nagios/blob/master/src/check_node.erl),  
compiled it:  
c(check_node).  

ran it:  
check_node:main([{node, 'xx.xx.xx.xx'}]).   

then got:  

** exception error: undefined function getopt:parse/2  
in function check_node:main/2 (check_node.erl, line 15)  

Here is where I am. I found this:  

https://github.com/jcomellas/getopt  

I grabbed the source code, compiled it under otp_src_R16B02.  

ran it again:  
2 check_node:main([{node, 'xx.xx.xx.xx'}]).  
UNKNOWN: invalid_option_arg {check,{node,'xx.xx.xx.xx'}}  

Am I on the right path?  

Thanks,  

Kathleen  












--  
View this message in context: 
http://riak-users.197444.n3.nabble.com/riak-nagios-script-tp4030025p4030037.html
  
Sent from the Riak Users mailing list archive at Nabble.com.  

___  
riak-users mailing list  
riak-users@lists.basho.com  
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com  
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Stalled handoffs on a prod cluster after server crash

2013-12-10 Thread Jeppe Toustrup
What does riak-admin transfers tell you? Are there any transfers in progress?
You can try to set the amount of allowed transfers per host to 0 and
then back to 2 (the default) or whatever you want, in order to restart
any transfers which may be in progress. You can do that with the
riak-admin transfer-limit number command.

-- 
Jeppe Fihl Toustrup
Operations Engineer
Falcon Social

On 9 December 2013 15:48, Ivaylo Panitchkov ipanitch...@hibernum.com wrote:


 Hello,

 We have a prod cluster of four machines running riak (1.1.4 2012-06-19) 
 Debian x86_64.
 Two days ago one of the servers went down because of a hardware failure.
 I force-removed the machine in question to re-balance the cluster before 
 adding the new machine.
 Since then the cluster is operating properly, but I noticed some handoffs are 
 stalled now.
 I had similar situation awhile ago that was solved by simply forcing the 
 handoffs, but this time the same approach didn't work.
 Any ideas, solutions or just hints are greatly appreciated.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Stalled handoffs on a prod cluster after server crash

2013-12-10 Thread Ivaylo Panitchkov
Hello,
Below is the transfers info:

~# riak-admin transfers
Attempting to restart script through sudo -u riak
'r...@ccc.ccc.ccc.ccc' waiting to handoff 7 partitions
'r...@bbb.bbb.bbb.bbb' waiting to handoff 7 partitions
'r...@aaa.aaa.aaa.aaa' waiting to handoff 5 partitions

~# riak-admin member_status
Attempting to restart script through sudo -u riak
= Membership
==
Status RingPendingNode
---
valid  45.3% 34.4%'r...@aaa.aaa.aaa.aaa'
valid  26.6% 32.8%'r...@bbb.bbb.bbb.bbb'
valid  28.1% 32.8%'r...@ccc.ccc.ccc.ccc'
---

It's stuck with all those handoffs for few days now.
riak-admin ring_status gives me the same info like the one I mentioned when
opened the case.
I noticed AAA.AAA.AAA.AAA experience more load than other servers as it's
responsible for almost half of the data.
Is it safe to add another machine to the cluster in order to relief
AAA.AAA.AAA.AAA even when the issue with handoffs is not yet resolved?

Thanks,
Ivaylo



On Tue, Dec 10, 2013 at 3:04 PM, Jeppe Toustrup je...@falconsocial.comwrote:

 What does riak-admin transfers tell you? Are there any transfers in
 progress?
 You can try to set the amount of allowed transfers per host to 0 and
 then back to 2 (the default) or whatever you want, in order to restart
 any transfers which may be in progress. You can do that with the
 riak-admin transfer-limit number command.

 --
 Jeppe Fihl Toustrup
 Operations Engineer
 Falcon Social


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Stalled handoffs on a prod cluster after server crash

2013-12-10 Thread Jeppe Toustrup
Try to take a look at this thread from November where I experienced a
similar problem:
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-November/014027.html

The following mails in the thread mentions things you try to correct
the problem, and what I ended up doing with the help of Basho
employees.

-- 
Jeppe Fihl Toustrup
Operations Engineer
Falcon Social

On 10 December 2013 22:03, Ivaylo Panitchkov ipanitch...@hibernum.com wrote:
 Hello,
 Below is the transfers info:

 ~# riak-admin transfers

 Attempting to restart script through sudo -u riak
 'r...@ccc.ccc.ccc.ccc' waiting to handoff 7 partitions
 'r...@bbb.bbb.bbb.bbb' waiting to handoff 7 partitions
 'r...@aaa.aaa.aaa.aaa' waiting to handoff 5 partitions


 ~# riak-admin member_status
 Attempting to restart script through sudo -u riak
 = Membership
 ==
 Status RingPendingNode
 ---
 valid  45.3% 34.4%'r...@aaa.aaa.aaa.aaa'
 valid  26.6% 32.8%'r...@bbb.bbb.bbb.bbb'
 valid  28.1% 32.8%'r...@ccc.ccc.ccc.ccc'
 ---

 It's stuck with all those handoffs for few days now.
 riak-admin ring_status gives me the same info like the one I mentioned when
 opened the case.
 I noticed AAA.AAA.AAA.AAA experience more load than other servers as it's
 responsible for almost half of the data.
 Is it safe to add another machine to the cluster in order to relief
 AAA.AAA.AAA.AAA even when the issue with handoffs is not yet resolved?

 Thanks,
 Ivaylo

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak nagios script

2013-12-10 Thread kzhang
Hi Alex,

Thanks.

I am completely new to erlang. When googling how to run an erlang program, I
came across
http://www.erlang.org/documentation/doc-5.3/doc/getting_started/getting_started.html
. That's how I got started. 

To run the script using escript, based on
http://www.erlang.org/doc/man/escript.html, looks like I dont need to
compile the scripts, so I ran:

/usr/local/bin/escript check_node --node riak@127.0.0.1 check_riak_repl

got
escript: Failed to open file: check_node





--
View this message in context: 
http://riak-users.197444.n3.nabble.com/riak-nagios-script-tp4030025p4030043.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Stalled handoffs on a prod cluster after server crash

2013-12-10 Thread Mark Phillips
Hi Ivaylo,

Is there anything useful in console.log of any (or all) the nodes? If
so, throw it in a gist and we'll take a look at it.

Mark

On Tue, Dec 10, 2013 at 1:13 PM, Jeppe Toustrup je...@falconsocial.com wrote:
 Try to take a look at this thread from November where I experienced a
 similar problem:
 http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-November/014027.html

 The following mails in the thread mentions things you try to correct
 the problem, and what I ended up doing with the help of Basho
 employees.

 --
 Jeppe Fihl Toustrup
 Operations Engineer
 Falcon Social

 On 10 December 2013 22:03, Ivaylo Panitchkov ipanitch...@hibernum.com wrote:
 Hello,
 Below is the transfers info:

 ~# riak-admin transfers

 Attempting to restart script through sudo -u riak
 'r...@ccc.ccc.ccc.ccc' waiting to handoff 7 partitions
 'r...@bbb.bbb.bbb.bbb' waiting to handoff 7 partitions
 'r...@aaa.aaa.aaa.aaa' waiting to handoff 5 partitions


 ~# riak-admin member_status
 Attempting to restart script through sudo -u riak
 = Membership
 ==
 Status RingPendingNode
 ---
 valid  45.3% 34.4%'r...@aaa.aaa.aaa.aaa'
 valid  26.6% 32.8%'r...@bbb.bbb.bbb.bbb'
 valid  28.1% 32.8%'r...@ccc.ccc.ccc.ccc'
 ---

 It's stuck with all those handoffs for few days now.
 riak-admin ring_status gives me the same info like the one I mentioned when
 opened the case.
 I noticed AAA.AAA.AAA.AAA experience more load than other servers as it's
 responsible for almost half of the data.
 Is it safe to add another machine to the cluster in order to relief
 AAA.AAA.AAA.AAA even when the issue with handoffs is not yet resolved?

 Thanks,
 Ivaylo

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com