Re: Trying to identify the cause of these errors.

2017-01-10 Thread Benjamin Roth
Why don't you create a JIRA issue?
Assertion errors, leaks and NPEs usually indicate a bug. This really should
not happen.

2017-01-11 4:35 GMT+01:00 Gopal, Dhruva :

> Any suggestions/recommendations? Anything will help at this point. Thanks!
>
> Regards,
>
> *DHRUVA GOPAL*
>
> *sr. MANAGER, ENGINEERING*
>
> *REPORTING, ANALYTICS AND BIG DATA*
>
> *+1 408.325.2011 <+1%20408-325-2011>* *WORK*
>
> *+1 408.219.1094 <+1%20408-219-1094>* *MOBILE*
>
> *UNITED STATES*
>
> *dhruva.go...@aspect.com  *
>
> *aspect.com *
>
> [image: Description: http://webapp2.aspect.com/EmailSigLogo-rev.jpg]
>
>
>
> From: Dhruva Gopal
> Date: Monday, January 9, 2017 at 1:42 PM
> To: "user@cassandra.apache.org"
> Cc: Richard Ney
> Subject: Trying to identify the cause of these errors.
>
> My colleague (Richard Ney) has already been in touch with you on a couple
> of other issues we’ve seen in the past. Our development been trying to
> track down some new issues we’ve been seeing on one of our pre-prod
> environments where we’ve been having consistent failures very often (every
> day or every 2-3 days), even if load/number of transactions are very light.
> We’re running a 2 data center deployment with 3 nodes in each data center.
> Our tables are setup with replication factor = 2 and we have 16G dedicated
> to the heap with the G1GC for garbage collection. Our systems are AWS
> M4.2xlarge with 8 CPUs and 32GB of RAM and we have 2 general purpose EBS
> volumes on each node of 500GB each. Once we hit this it seems like the only
> way to recover is to shutdown the cluster and restart. Running repairs
> after the restart often results in failures and we pretty much end up
> having to truncate the tables before starting up clean again. We are not
> sure if the two are inter-related. We pretty much see the same issue on all
> the nodes. If anyone has any tips or any suggestions on how to diagnose
> this further, it will help a great deal! The issues are:
>
>
>
> *Issue 1:* Once the errors occur they just repeat for a bit followed by
> the errors in issue 2.
>
> INFO  [CompactionExecutor:165] 2017-01-08 08:32:39,915
> AutoSavingCache.java:386 - Saved KeyCache (63 items) in 5 ms
> INFO  [IndexSummaryManager:1] 2017-01-08 08:32:41,438
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [HANDSHAKE-ahldataslave4.bos.manhattan.aspect-cloud.net/10.184.8.224]
> 2017-01-08 09:30:03,988 OutboundTcpConnection.java:505 - Handshaking
> version with ahldataslave4.bos.manhattan.aspect-cloud.net/10.184.8.224
> INFO  [IndexSummaryManager:1] 2017-01-08 09:32:41,440
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> WARN  [SharedPool-Worker-9] 2017-01-08 10:30:00,116
> BatchStatement.java:289 - Batch of prepared statements for
> [manhattan.rcmessages] is of size 9264, exceeding specified threshold of
> 5120 by 4144.
> INFO  [IndexSummaryManager:1] 2017-01-08 10:32:41,442
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [IndexSummaryManager:1] 2017-01-08 11:32:41,443
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [CompactionExecutor:162] 2017-01-08 12:32:39,914
> AutoSavingCache.java:386 - Saved KeyCache (108 items) in 4 ms
> INFO  [IndexSummaryManager:1] 2017-01-08 12:32:41,446
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [IndexSummaryManager:1] 2017-01-08 13:32:41,448
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [IndexSummaryManager:1] 2017-01-08 14:32:41,450
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [IndexSummaryManager:1] 2017-01-08 15:32:41,451
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> INFO  [CompactionExecutor:170] 2017-01-08 16:32:39,915
> AutoSavingCache.java:386 - Saved KeyCache (109 items) in 4 ms
> INFO  [IndexSummaryManager:1] 2017-01-08 16:32:41,453
> IndexSummaryRedistribution.java:74 - Redistributing index summaries
> WARN  [SharedPool-Worker-4] 2017-01-08 17:30:45,048
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
> Thread[SharedPool-Worker-4,5,main]: {}
> java.lang.AssertionError: null
> at org.apache.cassandra.db.rows.BufferCell.(BufferCell.java:49)
> ~[apache-cassandra-3.3.0.jar:3.3.0]
> at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:88)
> ~[apache-cassandra-3.3.0.jar:3.3.0]
> at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:83)
> ~[apache-cassandra-3.3.0.jar:3.3.0]
> at org.apache.cassandra.db.rows.BufferCell.purge(BufferCell.java:175)
> ~[apache-cassandra-3.3.0.jar:3.3.0]
> at org.apache.cassandra.db.rows.ComplexColumnData.lambda$
> purge$107(ComplexColumnData.java:165) ~[apache-cassandra-3.3.0.jar:3.3.0]
> at 
> org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650)
> ~[apache-cassandra-3.3.0.jar:3.3.0]
> at 

Re: Logs appear to contradict themselves during bootstrap steps

2017-01-10 Thread Sotirios Delimanolis
There was no need to assassinate in this case. 'nodetool removenode' worked 
fine (didn't want to risk losing data). I just don't follow the logic described 
by the logs. 

On Friday, January 6, 2017 5:45 PM, Edward Capriolo  
wrote:
 

 
On Fri, Jan 6, 2017 at 6:45 PM, Sotirios Delimanolis  
wrote:

I forgot to check nodetool gossipinfo. Still, why does the first check think 
that the address exists, but the second doesn't? 

On Friday, January 6, 2017 1:11 PM, David Berry  
wrote:
 

 I’ve encountered this previously where after removing a node, gossip info is 
retained for 72 hours which doesn’t allow the IP to be reused during that 
period.   You can check how long gossip will retain this information using 
“nodetool gossipinfo” where the epoch time will be shown with status    For 
example….    Nodetool gossipinfo    /10.236.70.199   generation:1482436691   
heartbeat:3942407   STATUS:3942404:LEFT, 3074457345618261000,1483995662 276   
LOAD:3942267:3.60685807E8   SCHEMA:223625:acbf0adb-1bbe- 384a-acd7-6a46609497f1 
  DC:20:orion   RACK:22:r1   RELEASE_VERSION:4:2.1.16   
RPC_ADDRESS:3:10.236.70.199   SEVERITY:3942406:0. 25094103813171387   
NET_VERSION:1:8   HOST_ID:2:cd2a767f-3716-4717- 9106-52f0380e6184   
TOKENS:15:    Converting it from epoch…..    local@img2116saturn101:~$ 
date -d @$((1483995662276/1000)) Mon Jan  9 21:01:02 UTC 2017    At the time we 
waited the 72 hour period before reusing the IP, I’ve not used replace_address 
previously.       From: Sotirios Delimanolis [mailto:sotodel...@yahoo.com]
Sent: Friday, January 6, 2017 2:38 PM
To: User 
Subject: Logs appear to contradict themselves during bootstrap steps    We had 
a node go down in our cluster and its disk had to be wiped. During that time, 
all nodes in the cluster have restarted at least once.    We want to add the 
bad node back to the ring. It has the same IP/hostname. I follow the steps here 
for "Adding nodes to an existing cluster."    When the process is started up, 
it reports    A node with address / already exists, 
cancelling join. Use cassandra.replace_address if you want to replace this 
node.    I found this error message in theStorageService using theGossiper 
instance to look up the node's state. Apparently, the node knows about it. So I 
followed the instructions and added thecassandra.replace_address system 
property and restarted the process.    But it reports    Cannot replace_address 
/ because it doesn't exist in gossip    So which one is it? Does the 
ring know about it or not? Running "nodetool ring" does show it on all other 
nodes.    I've seen CASSANDRA-8138 andthe conditions are the same, but I can't 
understand why it thinks it's not part of gossip. What's the difference between 
the gossip check used to make this determination and the gossip check used for 
the first error message? Can someone explain?    I've since retrieved the 
node's id and used it to "nodetool removenode". After rebalancing, I added the 
node back and "nodetool cleaned" up. Everything's up and running, but I'd like 
to understand what Cassandra was doing.          

   

In case you have not seen check out 
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsAssassinate.html
 this is what you too when you really want something to go away from gossip.

   

Re: Trying to identify the cause of these errors.

2017-01-10 Thread Gopal, Dhruva
Any suggestions/recommendations? Anything will help at this point. Thanks!

Regards,
DHRUVA GOPAL
sr. MANAGER, ENGINEERING
REPORTING, ANALYTICS AND BIG DATA
+1 408.325.2011 WORK
+1 408.219.1094 MOBILE
UNITED STATES
dhruva.go...@aspect.com
aspect.com
[Description: http://webapp2.aspect.com/EmailSigLogo-rev.jpg]


From: Dhruva Gopal
Date: Monday, January 9, 2017 at 1:42 PM
To: "user@cassandra.apache.org"
Cc: Richard Ney
Subject: Trying to identify the cause of these errors.

My colleague (Richard Ney) has already been in touch with you on a couple of 
other issues we’ve seen in the past. Our development been trying to track down 
some new issues we’ve been seeing on one of our pre-prod environments where 
we’ve been having consistent failures very often (every day or every 2-3 days), 
even if load/number of transactions are very light. We’re running a 2 data 
center deployment with 3 nodes in each data center. Our tables are setup with 
replication factor = 2 and we have 16G dedicated to the heap with the G1GC for 
garbage collection. Our systems are AWS M4.2xlarge with 8 CPUs and 32GB of RAM 
and we have 2 general purpose EBS volumes on each node of 500GB each. Once we 
hit this it seems like the only way to recover is to shutdown the cluster and 
restart. Running repairs after the restart often results in failures and we 
pretty much end up having to truncate the tables before starting up clean 
again. We are not sure if the two are inter-related. We pretty much see the 
same issue on all the nodes. If anyone has any tips or any suggestions on how 
to diagnose this further, it will help a great deal! The issues are:



Issue 1: Once the errors occur they just repeat for a bit followed by the 
errors in issue 2.

INFO  [CompactionExecutor:165] 2017-01-08 08:32:39,915 AutoSavingCache.java:386 
- Saved KeyCache (63 items) in 5 ms
INFO  [IndexSummaryManager:1] 2017-01-08 08:32:41,438 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [HANDSHAKE-ahldataslave4.bos.manhattan.aspect-cloud.net/10.184.8.224] 
2017-01-08 09:30:03,988 OutboundTcpConnection.java:505 - Handshaking version 
with ahldataslave4.bos.manhattan.aspect-cloud.net/10.184.8.224
INFO  [IndexSummaryManager:1] 2017-01-08 09:32:41,440 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
WARN  [SharedPool-Worker-9] 2017-01-08 10:30:00,116 BatchStatement.java:289 - 
Batch of prepared statements for [manhattan.rcmessages] is of size 9264, 
exceeding specified threshold of 5120 by 4144.
INFO  [IndexSummaryManager:1] 2017-01-08 10:32:41,442 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2017-01-08 11:32:41,443 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [CompactionExecutor:162] 2017-01-08 12:32:39,914 AutoSavingCache.java:386 
- Saved KeyCache (108 items) in 4 ms
INFO  [IndexSummaryManager:1] 2017-01-08 12:32:41,446 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2017-01-08 13:32:41,448 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2017-01-08 14:32:41,450 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2017-01-08 15:32:41,451 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [CompactionExecutor:170] 2017-01-08 16:32:39,915 AutoSavingCache.java:386 
- Saved KeyCache (109 items) in 4 ms
INFO  [IndexSummaryManager:1] 2017-01-08 16:32:41,453 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
WARN  [SharedPool-Worker-4] 2017-01-08 17:30:45,048 
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-4,5,main]: {}
java.lang.AssertionError: null
at org.apache.cassandra.db.rows.BufferCell.(BufferCell.java:49) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:88) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:83) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.purge(BufferCell.java:175) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.db.rows.ComplexColumnData.lambda$purge$107(ComplexColumnData.java:165)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:668) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.db.rows.ComplexColumnData.transformAndFilter(ComplexColumnData.java:170)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
at 

Re: Strange issue wherein cassandra not being started from cron

2017-01-10 Thread Edward Capriolo
On Tuesday, January 10, 2017, Jonathan Haddad  wrote:

> Last I checked, cron doesn't load the same, full environment you see when
> you log in. Also, why put Cassandra on a cron?
> On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal  > wrote:
>
>> Hi Ajay,
>>
>> Have you had a look at cron logs? - mine is in path /var/log/cron
>>
>> Thanks & Regards,
>>
>> On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg > > wrote:
>>
>>> Hi All.
>>>
>>> Facing a very weird issue, wherein the command
>>>
>>> */etc/init.d/cassandra start*
>>>
>>> causes cassandra to start when the command is run from command-line.
>>>
>>>
>>> However, if I put the above as a cron job
>>>
>>>
>>>
>>> ** * * * * /etc/init.d/cassandra start*
>>> cassandra never starts.
>>>
>>>
>>> I have checked, and "cron" service is running.
>>>
>>>
>>> Any ideas what might be wrong?
>>> I am pasting the cassandra script for brevity.
>>>
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>>
>>> 
>>> 
>>> #! /bin/sh
>>> ### BEGIN INIT INFO
>>> # Provides:  cassandra
>>> # Required-Start:$remote_fs $network $named $time
>>> # Required-Stop: $remote_fs $network $named $time
>>> # Should-Start:  ntp mdadm
>>> # Should-Stop:   ntp mdadm
>>> # Default-Start: 2 3 4 5
>>> # Default-Stop:  0 1 6
>>> # Short-Description: distributed storage system for structured data
>>> # Description:   Cassandra is a distributed (peer-to-peer) system for
>>> #the management and storage of structured data.
>>> ### END INIT INFO
>>>
>>> # Author: Eric Evans >> >
>>>
>>> DESC="Cassandra"
>>> NAME=cassandra
>>> PIDFILE=/var/run/$NAME/$NAME.pid
>>> SCRIPTNAME=/etc/init.d/$NAME
>>> CONFDIR=/etc/cassandra
>>> WAIT_FOR_START=10
>>> CASSANDRA_HOME=/usr/share/cassandra
>>> FD_LIMIT=10
>>>
>>> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
>>> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
>>> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>>>
>>> # Read configuration variable file if it is present
>>> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>>>
>>> # Read Cassandra environment file.
>>> . /etc/cassandra/cassandra-env.sh
>>>
>>> if [ -z "$JVM_OPTS" ]; then
>>> echo "Initialization failed; \$JVM_OPTS not set!" >&2
>>> exit 3
>>> fi
>>>
>>> export JVM_OPTS
>>>
>>> # Export JAVA_HOME, if set.
>>> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>>>
>>> # Load the VERBOSE setting and other rcS variables
>>> . /lib/init/vars.sh
>>>
>>> # Define LSB log_* functions.
>>> # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
>>> . /lib/lsb/init-functions
>>>
>>> #
>>> # Function that returns 0 if process is running, or nonzero if not.
>>> #
>>> # The nonzero value is 3 if the process is simply not running, and 1 if
>>> the
>>> # process is not running but the pidfile exists (to match the exit codes
>>> for
>>> # the "status" command; see LSB core spec 3.1, section 20.2)
>>> #
>>> CMD_PATT="cassandra.+CassandraDaemon"
>>> is_running()
>>> {
>>> if [ -f $PIDFILE ]; then
>>> pid=`cat $PIDFILE`
>>> grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
>>> return 1
>>> fi
>>> return 3
>>> }
>>> #
>>> # Function that starts the daemon/service
>>> #
>>> do_start()
>>> {
>>> # Return
>>> #   0 if daemon has been started
>>> #   1 if daemon was already running
>>> #   2 if daemon could not be started
>>>
>>> ulimit -l unlimited
>>> ulimit -n "$FD_LIMIT"
>>>
>>> cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
>>> heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
>>> error_log_f="$cassandra_home/hs_err_`date +%s`.log"
>>>
>>> [ -e `dirname "$PIDFILE"` ] || \
>>> install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`
>>>
>>>
>>>
>>> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p
>>> "$PIDFILE" -t >/dev/null || return 1
>>>
>>> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p
>>> "$PIDFILE" -- \
>>> -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null ||
>>> return 2
>>>
>>> }
>>>
>>> #
>>> # Function that stops the daemon/service
>>> #
>>> do_stop()
>>> {
>>> # Return
>>> #   0 if daemon has been stopped
>>> #   1 if daemon was already stopped
>>> #   2 if daemon could not be stopped
>>> #   other if a failure occurred
>>> start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
>>> RET=$?
>>> rm -f "$PIDFILE"
>>> return $RET
>>> }
>>>
>>> case "$1" in
>>>   start)
>>> [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
>>> do_start
>>> 

Documented CQL limits seem to be wrong (how are sets and lists implemented in the storage layer)

2017-01-10 Thread Sotirios Delimanolis
We're using Cassandra 2.2.
This document lists a number of CQL limits. I'm particularly interested in the 
Collection limits for Set and List. If I've interpreted it correctly, the 
document states that values in Sets are limited to 65535 bytes. 
This limit, as far as I know, exists because the set identity is implemented 
with a composite value in the column name of the storage engine's cell (similar 
to the clustering column value limit), which CQL restricts to that many bytes. 
(Is this correct?)
Consider a table like
CREATE TABLE test.bounds (    someid text,    someorder text,    words 
set,
    PRIMARY KEY (guid, deviceid))
with 
PreparedStatement ps = session.prepare("INSERT INTO bounds (someid, someorder, 
epset) VALUES (?, ?, ?)");BoundStatement bs = ps.bind("id", "order", 
ImmutableSet.of(StringUtils.repeat('a', 66000)));session.execute(bs);

This will throw the expected exception
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: The sum 
of all clustering columns is too long (66024 > 65535)
Now if I change the table to use a List instead of a Set
CREATE TABLE test.bounds (    someid text,    someorder text,    words 
Set,
    PRIMARY KEY (guid, deviceid))
and use
BoundStatement bs = ps.bind("id", "order", 
ImmutableList.of(StringUtils.repeat('a', 66000)));
I do not receive an exception. The document, however, states that List value 
sizes are also limited to 65535 bytes. Is the document incorrect or am I 
misinterpreting?
I assumed List values are implemented as simple column values in the underlying 
storage and the order is maintained through their timestamps.


Re: Backups eating up disk space

2017-01-10 Thread Khaja, Raziuddin (NIH/NLM/NCBI) [C]
Hello Kunal,

I would take a look at the following configuration options in the Cassandra.yaml

Common automatic backup settings
Incremental_backups:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__incremental_backups

(Default: false) Backs up data updated since the last snapshot was taken. When 
enabled, Cassandra creates a hard link to each SSTable flushed or streamed 
locally in a backups subdirectory of the keyspace data. Removing these links is 
the operator's responsibility.

snapshot_before_compaction:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__snapshot_before_compaction

(Default: false) Enables or disables taking a snapshot before each compaction. 
A snapshot is useful to back up data when there is a data format change. Be 
careful using this option: Cassandra does not clean up older snapshots 
automatically.


Advanced automatic backup setting
auto_snapshot:
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__auto_snapshot

(Default: true) Enables or disables whether Cassandra takes a snapshot of the 
data before truncating a keyspace or dropping a table. To prevent data loss, 
Datastax strongly advises using the default setting. If you set auto_snapshot 
to false, you lose data on truncation or drop.


nodetool also provides methods to manage snapshots. 
http://docs.datastax.com/en/archived/cassandra/3.x/cassandra/tools/toolsNodetool.html
See the specific commands:

  *   nodetool 
clearsnapshot
Removes one or more snapshots.
  *   nodetool 
listsnapshots
Lists snapshot names, size on disk, and true size.
  *   nodetool 
snapshot
Take a snapshot of one or more keyspaces, or of a table, to backup data.

As far as I am aware, using rm is perfectly safe to delete the directories for 
snapshots/backups as long as you are careful not to delete your actively used 
sstable files and directories.  I think the nodetool clearsnapshot command is 
provided so that you don’t accidentally delete actively used files.  Last I 
used clearsnapshot, (a very long time ago), I thought it left behind the 
directory, but this could have been fixed in newer versions (so you might want 
to check that).

HTH
-Razi


From: Jonathan Haddad 
Reply-To: "user@cassandra.apache.org" 
Date: Tuesday, January 10, 2017 at 12:26 PM
To: "user@cassandra.apache.org" 
Subject: Re: Backups eating up disk space

If you remove the files from the backup directory, you would not have data loss 
in the case of a node going down.  They're hard links to the same files that 
are in your data directory, and are created when an sstable is written to disk. 
 At the time, they take up (almost) no space, so they aren't a big deal, but 
when the sstable gets compacted, they stick around, so they end up not freeing 
space up.

Usually you use incremental backups as a means of moving the sstables off the 
node to a backup location.  If you're not doing anything with them, they're 
just wasting space and you should disable incremental backups.

Some people take snapshots then rely on incremental backups.  Others use the 
tablesnap utility which does sort of the same thing.

On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar 
> wrote:
Thanks for quick reply, Jon.

But, what about in case of node/cluster going down? Would there be data loss if 
I remove these files manually?

How is it typically managed in production setups?
What are the best-practices for the same?
Do people take snapshots on each node before removing the backups?

This is my first production deployment - so, still trying to learn.

Thanks,
Kunal

On 10 January 2017 at 21:36, Jonathan Haddad 
> wrote:
You can just delete them off the filesystem (rm)

On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar 
> wrote:
Hi all,

We have a 3-node cassandra cluster with incremental backup set to true.
Each node has 1TB data volume that stores cassandra data.

The load in the output of 'nodetool status' comes up at around 260GB each node.
All our keyspaces use replication factor = 3.

However, the df output shows the data volumes consuming around 850GB of space.
I checked the keyspace directory structures - most of the space goes in 
/data///backups.

We have never manually run snapshots.

What is the typical procedure to clear the backups?
Can it be done without taking the node 

Point in time restore

2017-01-10 Thread Hannu Kröger
Hello,

Are there any guides how to do a point-in-time restore for Cassandra?

All I have seen is this:
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/configuration/configLogArchive_t.html
 


That gives an idea how to store the data for restore but how to do an actual 
restore is still a mystery to me.

Any pointers?

Cheers,
Hannu

RE: RemoveNode CPU Spike Question

2017-01-10 Thread Anubhav Kale
Well, looking through logs I confirmed that my understanding below is correct, 
but would be good to hear from experts for sure 

From: Anubhav Kale [mailto:anubhav.k...@microsoft.com]
Sent: Tuesday, January 10, 2017 9:58 AM
To: user@cassandra.apache.org
Cc: Sean Usher 
Subject: RemoveNode CPU Spike Question

Hello,

Recently, I started noticing an interesting pattern. When I execute 
“removenode”, a subset of the nodes that now own the tokens result it in a CPU 
spike / disk activity, and sometimes SSTables on those nodes shoot up.

After looking through the code, it appears to me that below function forces 
data to be streamed from some of the new nodes to the node from where 
“removenode” is kicked in. Is my understanding correct ?

https://github.com/apache/cassandra/blob/d384e781d6f7c028dbe88cfe9dd3e966e72cd046/src/java/org/apache/cassandra/service/StorageService.java#L2548

Our nodes don’t run very hot, but it appears this streaming causes them to have 
issues. Have other people seen this ?

Thanks !


RemoveNode CPU Spike Question

2017-01-10 Thread Anubhav Kale
Hello,

Recently, I started noticing an interesting pattern. When I execute 
"removenode", a subset of the nodes that now own the tokens result it in a CPU 
spike / disk activity, and sometimes SSTables on those nodes shoot up.

After looking through the code, it appears to me that below function forces 
data to be streamed from some of the new nodes to the node from where 
"removenode" is kicked in. Is my understanding correct ?

https://github.com/apache/cassandra/blob/d384e781d6f7c028dbe88cfe9dd3e966e72cd046/src/java/org/apache/cassandra/service/StorageService.java#L2548

Our nodes don't run very hot, but it appears this streaming causes them to have 
issues. Have other people seen this ?

Thanks !


Re: Backups eating up disk space

2017-01-10 Thread Jonathan Haddad
If you remove the files from the backup directory, you would not have data
loss in the case of a node going down.  They're hard links to the same
files that are in your data directory, and are created when an sstable is
written to disk.  At the time, they take up (almost) no space, so they
aren't a big deal, but when the sstable gets compacted, they stick around,
so they end up not freeing space up.

Usually you use incremental backups as a means of moving the sstables off
the node to a backup location.  If you're not doing anything with them,
they're just wasting space and you should disable incremental backups.

Some people take snapshots then rely on incremental backups.  Others use
the tablesnap utility which does sort of the same thing.

On Tue, Jan 10, 2017 at 9:18 AM Kunal Gangakhedkar 
wrote:

> Thanks for quick reply, Jon.
>
> But, what about in case of node/cluster going down? Would there be data
> loss if I remove these files manually?
>
> How is it typically managed in production setups?
> What are the best-practices for the same?
> Do people take snapshots on each node before removing the backups?
>
> This is my first production deployment - so, still trying to learn.
>
> Thanks,
> Kunal
>
> On 10 January 2017 at 21:36, Jonathan Haddad  wrote:
>
> You can just delete them off the filesystem (rm)
>
> On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar <
> kgangakhed...@gmail.com> wrote:
>
> Hi all,
>
> We have a 3-node cassandra cluster with incremental backup set to true.
> Each node has 1TB data volume that stores cassandra data.
>
> The load in the output of 'nodetool status' comes up at around 260GB each
> node.
> All our keyspaces use replication factor = 3.
>
> However, the df output shows the data volumes consuming around 850GB of
> space.
> I checked the keyspace directory structures - most of the space goes in
> /data///backups.
>
> We have never manually run snapshots.
>
> What is the typical procedure to clear the backups?
> Can it be done without taking the node offline?
>
> Thanks,
> Kunal
>
>
>


Re: Backups eating up disk space

2017-01-10 Thread Kunal Gangakhedkar
Thanks for quick reply, Jon.

But, what about in case of node/cluster going down? Would there be data
loss if I remove these files manually?

How is it typically managed in production setups?
What are the best-practices for the same?
Do people take snapshots on each node before removing the backups?

This is my first production deployment - so, still trying to learn.

Thanks,
Kunal

On 10 January 2017 at 21:36, Jonathan Haddad  wrote:

> You can just delete them off the filesystem (rm)
>
> On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar <
> kgangakhed...@gmail.com> wrote:
>
>> Hi all,
>>
>> We have a 3-node cassandra cluster with incremental backup set to true.
>> Each node has 1TB data volume that stores cassandra data.
>>
>> The load in the output of 'nodetool status' comes up at around 260GB each
>> node.
>> All our keyspaces use replication factor = 3.
>>
>> However, the df output shows the data volumes consuming around 850GB of
>> space.
>> I checked the keyspace directory structures - most of the space goes in
>> /data///backups.
>>
>> We have never manually run snapshots.
>>
>> What is the typical procedure to clear the backups?
>> Can it be done without taking the node offline?
>>
>> Thanks,
>> Kunal
>>
>


Re: Incremental Repair Migration

2017-01-10 Thread Bhuvan Rawal
Hi Amit,

You can try reaper, it makes repairs effortless. There are a host of other
benefits but most importantly it offers a Single portal to manage & track
ongoing as well as past repairs.

 For incremental repairs it breaks it into single segment per node, if you
find that it's indeed the case, you may have to increase segment timeout
when you run it for the first time as it repairs whole set of sstables.

Regards,
Bhuvan

On Jan 10, 2017 8:44 PM, "Jonathan Haddad"  wrote:

Reaper suppers incremental repair.
On Mon, Jan 9, 2017 at 11:27 PM Amit Singh F 
wrote:

> Hi Jonathan,
>
>
>
> Really appreciate your response.
>
>
>
> It will not be possible for us to move to Reaper as of now, we are in
> process to migrate to Incremental repair.
>
>
>
> Also Running repair constantly will be costly affair in our case . For
> migrating to incremental repair with large set of dataset will take hours
> to be finished if we go ahead with procedure shared by Datastax.
>
>
>
> So any quick method to reduce that ?
>
>
>
> Regards
>
> Amit Singh
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Tuesday, January 10, 2017 11:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Incremental Repair Migration
>
>
>
> Your best bet is to just run repair constantly. We maintain an updated
> fork of Spotify's reaper tool to help manage it: https://github.com/
> thelastpickle/cassandra-reaper
>
> On Mon, Jan 9, 2017 at 10:04 PM Amit Singh F 
> wrote:
>
> Hi All,
>
>
>
> We are thinking of migrating from primary range repair (-pr) to
> incremental repair.
>
>
>
> Environment :
>
>
>
> · Cassandra 2.1.16
>
> • 25 Node cluster ,
>
> • RF 3
>
> • Data size up to 450 GB per nodes
>
>
>
> We found that running full repair will be taking around 8 hrs per node
> which *means 200 odd hrs*. for migrating the entire cluster to
> incremental repair. Even though there is zero downtime, it is quite
> unreasonable to ask for 200 hr maintenance window for migrating repairs.
>
>
>
> Just want to know how Cassandra users in community optimize the procedure
> to reduce migration time ?
>
>
>
> Thanks & Regards
>
> Amit Singh
>
>


Re: Backups eating up disk space

2017-01-10 Thread Jonathan Haddad
You can just delete them off the filesystem (rm)

On Tue, Jan 10, 2017 at 8:02 AM Kunal Gangakhedkar 
wrote:

> Hi all,
>
> We have a 3-node cassandra cluster with incremental backup set to true.
> Each node has 1TB data volume that stores cassandra data.
>
> The load in the output of 'nodetool status' comes up at around 260GB each
> node.
> All our keyspaces use replication factor = 3.
>
> However, the df output shows the data volumes consuming around 850GB of
> space.
> I checked the keyspace directory structures - most of the space goes in
> /data///backups.
>
> We have never manually run snapshots.
>
> What is the typical procedure to clear the backups?
> Can it be done without taking the node offline?
>
> Thanks,
> Kunal
>


Backups eating up disk space

2017-01-10 Thread Kunal Gangakhedkar
Hi all,

We have a 3-node cassandra cluster with incremental backup set to true.
Each node has 1TB data volume that stores cassandra data.

The load in the output of 'nodetool status' comes up at around 260GB each
node.
All our keyspaces use replication factor = 3.

However, the df output shows the data volumes consuming around 850GB of
space.
I checked the keyspace directory structures - most of the space goes in
/data///backups.

We have never manually run snapshots.

What is the typical procedure to clear the backups?
Can it be done without taking the node offline?

Thanks,
Kunal


Re: incremental repairs with -pr flag?

2017-01-10 Thread Bruno Lavoie


On 2016-10-24 13:39 (-0500), Alexander Dejanovski  
wrote: 
> Hi Sean,
> 
> In order to mitigate its impact, anticompaction is not fully executed when
> incremental repair is run with -pr. What you'll observe is that running
> repair on all nodes with -pr will leave sstables marked as unrepaired on
> all of them.
> 
> Then, if you think about it you realize it's no big deal as -pr is useless
> with incremental repair : data is repaired only once with incremental
> repair, which is what -pr intended to fix on full repair, by repairing all
> token ranges only once instead of times the replication factor.
> 
> Cheers,
> 
> Le lun. 24 oct. 2016 18:05, Sean Bridges  a
> écrit :
> 
> > Hey,
> >
> > In the datastax documentation on repair [1], it says,
> >
> > "The partitioner range option is recommended for routine maintenance. Do
> > not use it to repair a downed node. Do not use with incremental repair
> > (default for Cassandra 3.0 and later)."
> >
> > Why is it not recommended to use -pr with incremental repairs?
> >
> > Thanks,
> >
> > Sean
> >
> > [1]
> > https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsRepairNodesManualRepair.html
> > --
> >
> > Sean Bridges
> >
> > senior systems architect
> > Global Relay
> >
> > *sean.brid...@globalrelay.net* 
> >
> > *866.484.6630 *
> > New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore
> > (+65.3158.1301)
> >
> > Global Relay Archive supports email, instant messaging, BlackBerry,
> > Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter,
> > Facebook and more.
> >
> > Ask about *Global Relay Message*
> >  - The Future of
> > Collaboration in the Financial Services World
> >
> > All email sent to or from this address will be retained by Global Relay's
> > email archiving system. This message is intended only for the use of the
> > individual or entity to which it is addressed, and may contain information
> > that is privileged, confidential, and exempt from disclosure under
> > applicable law. Global Relay will not be liable for any compliance or
> > technical information provided herein. All trademarks are the property of
> > their respective owners.
> >
> > --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
> 
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 

Hello,

Was looking for exactly the same detail about the Datastax documentation, and 
not sure to understand everything from your response. I looked at my Cassandra: 
The Definitive Guide and nothing about this detail too.

IIRC:
- with incremental repair, it's safe to simply run 'nodetool repair' on each 
node, without any overhead or wasted resources (merkle trees building, 
compaction, etc)? 
- I've read that we must manually run manual anti-entropy repair on each node 
weekly or before the gc_grace_seconds (default 10 days)? Or only on returning 
dead node ? 

What's bad about running incremental repair on primary ranges only, node by 
node? Looks like a stepwise method to keep data consistent..

In many sources I'm looking at, all examples are as «nodetool repair -pr» and 
no metion about using -full with -pr like here:
http://www.datastax.com/dev/blog/repair-in-cassandra


So, to keep a system healthy, with less impact:
- what command to run nighly?
- what command to run weekly?

We're using C* 3.x

Thanks
Bruno Lavoie



Re: Incremental Repair Migration

2017-01-10 Thread Jonathan Haddad
Reaper suppers incremental repair.
On Mon, Jan 9, 2017 at 11:27 PM Amit Singh F 
wrote:

> Hi Jonathan,
>
>
>
> Really appreciate your response.
>
>
>
> It will not be possible for us to move to Reaper as of now, we are in
> process to migrate to Incremental repair.
>
>
>
> Also Running repair constantly will be costly affair in our case . For
> migrating to incremental repair with large set of dataset will take hours
> to be finished if we go ahead with procedure shared by Datastax.
>
>
>
> So any quick method to reduce that ?
>
>
>
> Regards
>
> Amit Singh
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Tuesday, January 10, 2017 11:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Incremental Repair Migration
>
>
>
> Your best bet is to just run repair constantly. We maintain an updated
> fork of Spotify's reaper tool to help manage it:
> https://github.com/thelastpickle/cassandra-reaper
>
> On Mon, Jan 9, 2017 at 10:04 PM Amit Singh F 
> wrote:
>
> Hi All,
>
>
>
> We are thinking of migrating from primary range repair (-pr) to
> incremental repair.
>
>
>
> Environment :
>
>
>
> · Cassandra 2.1.16
>
> • 25 Node cluster ,
>
> • RF 3
>
> • Data size up to 450 GB per nodes
>
>
>
> We found that running full repair will be taking around 8 hrs per node
> which *means 200 odd hrs*. for migrating the entire cluster to
> incremental repair. Even though there is zero downtime, it is quite
> unreasonable to ask for 200 hr maintenance window for migrating repairs.
>
>
>
> Just want to know how Cassandra users in community optimize the procedure
> to reduce migration time ?
>
>
>
> Thanks & Regards
>
> Amit Singh
>
>


How to calculate CPU Utilisation on each node?

2017-01-10 Thread Thomas Julian
Hello,



We are using Cassandra 2.1.13. We are calculating node CPU utilization using 
the below formula,



CPUUsage = CPURate / (AvailableProcessors*100) 



CPURate = (x2-x1)/(t2-t1), 



where x2 and x1 are the values of the attribute ProcessCpuTime at the time t2 
and t1 respectively. 

We retrieve the value of attributes ProcessCpuTime and AvailableProcessors from 
the ObjectName java.lang:type=OperatingSystem using JMX. 



Is it the correct way to calculate the CPU Utilization for a node? 



or 



Are there any other alternatives to calculate the CPU Utilization per node?



We are using a 32 core physical processor on each node and node CPU Utilization 
reaches 100% every now and then. We suspect that should not be the case.



Best Regards,

Julian.