RE: Incremental Repair Migration

2017-01-09 Thread Amit Singh F
Hi Jonathan,

Really appreciate your response.

It will not be possible for us to move to Reaper as of now, we are in process 
to migrate to Incremental repair.

Also Running repair constantly will be costly affair in our case . For 
migrating to incremental repair with large set of dataset will take hours to be 
finished if we go ahead with procedure shared by Datastax.

So any quick method to reduce that ?

Regards
Amit Singh

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Tuesday, January 10, 2017 11:50 AM
To: user@cassandra.apache.org
Subject: Re: Incremental Repair Migration

Your best bet is to just run repair constantly. We maintain an updated fork of 
Spotify's reaper tool to help manage it: 
https://github.com/thelastpickle/cassandra-reaper
On Mon, Jan 9, 2017 at 10:04 PM Amit Singh F 
> wrote:
Hi All,

We are thinking of migrating from primary range repair (-pr) to incremental 
repair.

Environment :


• Cassandra 2.1.16
• 25 Node cluster ,
• RF 3
• Data size up to 450 GB per nodes

We found that running full repair will be taking around 8 hrs per node which 
means 200 odd hrs. for migrating the entire cluster to incremental repair. Even 
though there is zero downtime, it is quite unreasonable to ask for 200 hr 
maintenance window for migrating repairs.

Just want to know how Cassandra users in community optimize the procedure to 
reduce migration time ?

Thanks & Regards
Amit Singh


OutOfMemoryError in startup process

2017-01-09 Thread Yuji Ito
Hi all,

I got OutOfMemoryError in startup process as below.
I have 3 questions about the error.

1. Why did Cassandra built by myself cause OutOfMemory errors?
OutOfMemory errors happened in startup process  in some (not all) nodes on
Cassandra 2.2.8 which I got from github and built by myself.
However, the error didn't happen on Cassandra 2.2.8 which was installed by
yum.
Is there any difference between Cassandra of github and yum?

2. Why can Cassandra continue the startup process when OutOfMemory error
happens in initializing system.hints?
Is that because the failure of loading the summary index isn't fatal?

3. Does this error cause consistency problem like rolling data back?
In my test, some updating were lost after the error happened (= the stale
data were read).

My cluster has 3 nodes. A node is AWS EC2 m4.large(2core, 8GB memory) with
Amazon Linux.
I tried a test which requested a lot of updates while each node of
Cassandra was killed and restarted.

== logs ==
INFO  [main] 2016-12-10 09:46:50,204 ColumnFamilyStore.java:389 -
Initializing system.hints
ERROR [SSTableBatchOpen:1] 2016-12-10 09:46:50,359
DebuggableThreadPoolExecutor.java:242 - Error in ThreadPoolExecutor
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.de
serializeBounds(MmappedSegmentedFile.java:411) ~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader.loadSummary(SSTableReader.java:850)
~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:700)
~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:672)
~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:466)
~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:371)
~[main/:na]
at 
org.apache.cassandra.io.sstable.format.SSTableReader$4.run(SSTableReader.java:509)
~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_91]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_91]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_91]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]


== JVM arguments ==
INFO  [main] 2016-12-10 09:42:53,798 CassandraDaemon.java:417 - JVM
Arguments: [-ea, -javaagent:/home/ec2-user/cass
andra/bin/../lib/jamm-0.3.0.jar, -XX:+CMSClassUnloadingEnabled,
-XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -Xms1996M,
-Xmx1996M, -Xmn200M, -XX:+HeapDumpOnOutOfMemoryError, -Xss256k,
-XX:StringTableSize=103, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC,
-XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8,
-XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75,
-XX:+UseCMSInitiatingOccupancyOnly, -XX:+UseTLAB,
-XX:+PerfDisableSharedMem, -XX:CompileCommandFile=/home/e
c2-user/cassandra/bin/../conf/hotspot_compiler, -XX:CMSWaitDuration=1,
-XX:+CMSParallelInitialMarkEnabled, -XX:+CMSEdenChunksRecordAlways,
-XX:CMSWaitDuration=1, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps,
-XX:+PrintHeapAtGC, -XX:+PrintTenuringDistribution,
-XX:+PrintGCApplicationStoppedTime, -XX:+PrintPromotionFailure,
-Xloggc:/home/ec2-user/cassandra/bin/../logs/gc.log,
-XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=10,
-XX:GCLogFileSize=10M, -Djava.net.preferIPv4Stack=true,
-Dcassandra.jmx.local.port=7199, -XX:+DisableExplicitGC,
-Djava.library.path=/home/ec2-user/cassandra/bin/../lib/sigar-bin,
-Dlogback.configurationFile=logback.xml,
-Dcassandra.logdir=/home/ec2-user/cassandra/bin/../logs,
-Dcassandra.storagedir=/home/ec2-user/cassandra/bin/../data]


thanks,
Yuji


Re: Incremental Repair Migration

2017-01-09 Thread Jonathan Haddad
Your best bet is to just run repair constantly. We maintain an updated fork
of Spotify's reaper tool to help manage it:
https://github.com/thelastpickle/cassandra-reaper
On Mon, Jan 9, 2017 at 10:04 PM Amit Singh F 
wrote:

> Hi All,
>
>
>
> We are thinking of migrating from primary range repair (-pr) to
> incremental repair.
>
>
>
> Environment :
>
>
>
> · Cassandra 2.1.16
>
> • 25 Node cluster ,
>
> • RF 3
>
> • Data size up to 450 GB per nodes
>
>
>
> We found that running full repair will be taking around 8 hrs per node
> which *means 200 odd hrs*. for migrating the entire cluster to
> incremental repair. Even though there is zero downtime, it is quite
> unreasonable to ask for 200 hr maintenance window for migrating repairs.
>
>
>
> Just want to know how Cassandra users in community optimize the procedure
> to reduce migration time ?
>
>
>
> Thanks & Regards
>
> Amit Singh
>


Re: Strange issue wherein cassandra not being started from cron

2017-01-09 Thread Jonathan Haddad
Last I checked, cron doesn't load the same, full environment you see when
you log in. Also, why put Cassandra on a cron?
On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal  wrote:

> Hi Ajay,
>
> Have you had a look at cron logs? - mine is in path /var/log/cron
>
> Thanks & Regards,
>
> On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg  wrote:
>
> Hi All.
>
> Facing a very weird issue, wherein the command
>
> */etc/init.d/cassandra start*
>
> causes cassandra to start when the command is run from command-line.
>
>
> However, if I put the above as a cron job
>
>
>
> ** * * * * /etc/init.d/cassandra start*
> cassandra never starts.
>
>
> I have checked, and "cron" service is running.
>
>
> Any ideas what might be wrong?
> I am pasting the cassandra script for brevity.
>
>
> Thanks and Regards,
> Ajay
>
>
>
> 
> #! /bin/sh
> ### BEGIN INIT INFO
> # Provides:  cassandra
> # Required-Start:$remote_fs $network $named $time
> # Required-Stop: $remote_fs $network $named $time
> # Should-Start:  ntp mdadm
> # Should-Stop:   ntp mdadm
> # Default-Start: 2 3 4 5
> # Default-Stop:  0 1 6
> # Short-Description: distributed storage system for structured data
> # Description:   Cassandra is a distributed (peer-to-peer) system for
> #the management and storage of structured data.
> ### END INIT INFO
>
> # Author: Eric Evans 
>
> DESC="Cassandra"
> NAME=cassandra
> PIDFILE=/var/run/$NAME/$NAME.pid
> SCRIPTNAME=/etc/init.d/$NAME
> CONFDIR=/etc/cassandra
> WAIT_FOR_START=10
> CASSANDRA_HOME=/usr/share/cassandra
> FD_LIMIT=10
>
> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>
> # Read configuration variable file if it is present
> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>
> # Read Cassandra environment file.
> . /etc/cassandra/cassandra-env.sh
>
> if [ -z "$JVM_OPTS" ]; then
> echo "Initialization failed; \$JVM_OPTS not set!" >&2
> exit 3
> fi
>
> export JVM_OPTS
>
> # Export JAVA_HOME, if set.
> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>
> # Load the VERBOSE setting and other rcS variables
> . /lib/init/vars.sh
>
> # Define LSB log_* functions.
> # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
> . /lib/lsb/init-functions
>
> #
> # Function that returns 0 if process is running, or nonzero if not.
> #
> # The nonzero value is 3 if the process is simply not running, and 1 if the
> # process is not running but the pidfile exists (to match the exit codes
> for
> # the "status" command; see LSB core spec 3.1, section 20.2)
> #
> CMD_PATT="cassandra.+CassandraDaemon"
> is_running()
> {
> if [ -f $PIDFILE ]; then
> pid=`cat $PIDFILE`
> grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
> return 1
> fi
> return 3
> }
> #
> # Function that starts the daemon/service
> #
> do_start()
> {
> # Return
> #   0 if daemon has been started
> #   1 if daemon was already running
> #   2 if daemon could not be started
>
> ulimit -l unlimited
> ulimit -n "$FD_LIMIT"
>
> cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
> heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
> error_log_f="$cassandra_home/hs_err_`date +%s`.log"
>
> [ -e `dirname "$PIDFILE"` ] || \
> install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`
>
>
>
> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p
> "$PIDFILE" -t >/dev/null || return 1
>
> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p
> "$PIDFILE" -- \
> -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null ||
> return 2
>
> }
>
> #
> # Function that stops the daemon/service
> #
> do_stop()
> {
> # Return
> #   0 if daemon has been stopped
> #   1 if daemon was already stopped
> #   2 if daemon could not be stopped
> #   other if a failure occurred
> start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
> RET=$?
> rm -f "$PIDFILE"
> return $RET
> }
>
> case "$1" in
>   start)
> [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
> do_start
> case "$?" in
> 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
> 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
> esac
> ;;
>   stop)
> [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
> do_stop
> case "$?" in
> 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
> 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
> esac
> ;;
>   restart|force-reload)
> log_daemon_msg "Restarting $DESC" "$NAME"
> do_stop
> case "$?" in
>   

Incremental Repair Migration

2017-01-09 Thread Amit Singh F
Hi All,

We are thinking of migrating from primary range repair (-pr) to incremental 
repair.

Environment :


* Cassandra 2.1.16
* 25 Node cluster ,
* RF 3
* Data size up to 450 GB per nodes

We found that running full repair will be taking around 8 hrs per node which 
means 200 odd hrs. for migrating the entire cluster to incremental repair. Even 
though there is zero downtime, it is quite unreasonable to ask for 200 hr 
maintenance window for migrating repairs.

Just want to know how Cassandra users in community optimize the procedure to 
reduce migration time ?

Thanks & Regards
Amit Singh


Re: Strange issue wherein cassandra not being started from cron

2017-01-09 Thread Bhuvan Rawal
Hi Ajay,

Have you had a look at cron logs? - mine is in path /var/log/cron

Thanks & Regards,

On Tue, Jan 10, 2017 at 9:45 AM, Ajay Garg  wrote:

> Hi All.
>
> Facing a very weird issue, wherein the command
>
> */etc/init.d/cassandra start*
>
> causes cassandra to start when the command is run from command-line.
>
>
> However, if I put the above as a cron job
>
>
>
> ** * * * * /etc/init.d/cassandra start*
> cassandra never starts.
>
>
> I have checked, and "cron" service is running.
>
>
> Any ideas what might be wrong?
> I am pasting the cassandra script for brevity.
>
>
> Thanks and Regards,
> Ajay
>
>
> 
> 
> #! /bin/sh
> ### BEGIN INIT INFO
> # Provides:  cassandra
> # Required-Start:$remote_fs $network $named $time
> # Required-Stop: $remote_fs $network $named $time
> # Should-Start:  ntp mdadm
> # Should-Stop:   ntp mdadm
> # Default-Start: 2 3 4 5
> # Default-Stop:  0 1 6
> # Short-Description: distributed storage system for structured data
> # Description:   Cassandra is a distributed (peer-to-peer) system for
> #the management and storage of structured data.
> ### END INIT INFO
>
> # Author: Eric Evans 
>
> DESC="Cassandra"
> NAME=cassandra
> PIDFILE=/var/run/$NAME/$NAME.pid
> SCRIPTNAME=/etc/init.d/$NAME
> CONFDIR=/etc/cassandra
> WAIT_FOR_START=10
> CASSANDRA_HOME=/usr/share/cassandra
> FD_LIMIT=10
>
> [ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
> [ -e /etc/cassandra/cassandra.yaml ] || exit 0
> [ -e /etc/cassandra/cassandra-env.sh ] || exit 0
>
> # Read configuration variable file if it is present
> [ -r /etc/default/$NAME ] && . /etc/default/$NAME
>
> # Read Cassandra environment file.
> . /etc/cassandra/cassandra-env.sh
>
> if [ -z "$JVM_OPTS" ]; then
> echo "Initialization failed; \$JVM_OPTS not set!" >&2
> exit 3
> fi
>
> export JVM_OPTS
>
> # Export JAVA_HOME, if set.
> [ -n "$JAVA_HOME" ] && export JAVA_HOME
>
> # Load the VERBOSE setting and other rcS variables
> . /lib/init/vars.sh
>
> # Define LSB log_* functions.
> # Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
> . /lib/lsb/init-functions
>
> #
> # Function that returns 0 if process is running, or nonzero if not.
> #
> # The nonzero value is 3 if the process is simply not running, and 1 if the
> # process is not running but the pidfile exists (to match the exit codes
> for
> # the "status" command; see LSB core spec 3.1, section 20.2)
> #
> CMD_PATT="cassandra.+CassandraDaemon"
> is_running()
> {
> if [ -f $PIDFILE ]; then
> pid=`cat $PIDFILE`
> grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
> return 1
> fi
> return 3
> }
> #
> # Function that starts the daemon/service
> #
> do_start()
> {
> # Return
> #   0 if daemon has been started
> #   1 if daemon was already running
> #   2 if daemon could not be started
>
> ulimit -l unlimited
> ulimit -n "$FD_LIMIT"
>
> cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
> heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
> error_log_f="$cassandra_home/hs_err_`date +%s`.log"
>
> [ -e `dirname "$PIDFILE"` ] || \
> install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`
>
>
>
> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p
> "$PIDFILE" -t >/dev/null || return 1
>
> start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p
> "$PIDFILE" -- \
> -p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null ||
> return 2
>
> }
>
> #
> # Function that stops the daemon/service
> #
> do_stop()
> {
> # Return
> #   0 if daemon has been stopped
> #   1 if daemon was already stopped
> #   2 if daemon could not be stopped
> #   other if a failure occurred
> start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
> RET=$?
> rm -f "$PIDFILE"
> return $RET
> }
>
> case "$1" in
>   start)
> [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
> do_start
> case "$?" in
> 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
> 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
> esac
> ;;
>   stop)
> [ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
> do_stop
> case "$?" in
> 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
> 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
> esac
> ;;
>   restart|force-reload)
> log_daemon_msg "Restarting $DESC" "$NAME"
> do_stop
> case "$?" in
>   0|1)
> do_start
> case "$?" in
>   0|1)
> do_start
> case "$?" in
> 0) log_end_msg 0 ;;
>   

Strange issue wherein cassandra not being started from cron

2017-01-09 Thread Ajay Garg
Hi All.

Facing a very weird issue, wherein the command

*/etc/init.d/cassandra start*

causes cassandra to start when the command is run from command-line.


However, if I put the above as a cron job



** * * * * /etc/init.d/cassandra start*
cassandra never starts.


I have checked, and "cron" service is running.


Any ideas what might be wrong?
I am pasting the cassandra script for brevity.


Thanks and Regards,
Ajay



#! /bin/sh
### BEGIN INIT INFO
# Provides:  cassandra
# Required-Start:$remote_fs $network $named $time
# Required-Stop: $remote_fs $network $named $time
# Should-Start:  ntp mdadm
# Should-Stop:   ntp mdadm
# Default-Start: 2 3 4 5
# Default-Stop:  0 1 6
# Short-Description: distributed storage system for structured data
# Description:   Cassandra is a distributed (peer-to-peer) system for
#the management and storage of structured data.
### END INIT INFO

# Author: Eric Evans 

DESC="Cassandra"
NAME=cassandra
PIDFILE=/var/run/$NAME/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
CONFDIR=/etc/cassandra
WAIT_FOR_START=10
CASSANDRA_HOME=/usr/share/cassandra
FD_LIMIT=10

[ -e /usr/share/cassandra/apache-cassandra.jar ] || exit 0
[ -e /etc/cassandra/cassandra.yaml ] || exit 0
[ -e /etc/cassandra/cassandra-env.sh ] || exit 0

# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME

# Read Cassandra environment file.
. /etc/cassandra/cassandra-env.sh

if [ -z "$JVM_OPTS" ]; then
echo "Initialization failed; \$JVM_OPTS not set!" >&2
exit 3
fi

export JVM_OPTS

# Export JAVA_HOME, if set.
[ -n "$JAVA_HOME" ] && export JAVA_HOME

# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh

# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions

#
# Function that returns 0 if process is running, or nonzero if not.
#
# The nonzero value is 3 if the process is simply not running, and 1 if the
# process is not running but the pidfile exists (to match the exit codes for
# the "status" command; see LSB core spec 3.1, section 20.2)
#
CMD_PATT="cassandra.+CassandraDaemon"
is_running()
{
if [ -f $PIDFILE ]; then
pid=`cat $PIDFILE`
grep -Eq "$CMD_PATT" "/proc/$pid/cmdline" 2>/dev/null && return 0
return 1
fi
return 3
}
#
# Function that starts the daemon/service
#
do_start()
{
# Return
#   0 if daemon has been started
#   1 if daemon was already running
#   2 if daemon could not be started

ulimit -l unlimited
ulimit -n "$FD_LIMIT"

cassandra_home=`getent passwd cassandra | awk -F ':' '{ print $6; }'`
heap_dump_f="$cassandra_home/java_`date +%s`.hprof"
error_log_f="$cassandra_home/hs_err_`date +%s`.log"

[ -e `dirname "$PIDFILE"` ] || \
install -d -ocassandra -gcassandra -m755 `dirname $PIDFILE`



start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -q -p
"$PIDFILE" -t >/dev/null || return 1

start-stop-daemon -S -c cassandra -a /usr/sbin/cassandra -b -p
"$PIDFILE" -- \
-p "$PIDFILE" -H "$heap_dump_f" -E "$error_log_f" >/dev/null ||
return 2

}

#
# Function that stops the daemon/service
#
do_stop()
{
# Return
#   0 if daemon has been stopped
#   1 if daemon was already stopped
#   2 if daemon could not be stopped
#   other if a failure occurred
start-stop-daemon -K -p "$PIDFILE" -R TERM/30/KILL/5 >/dev/null
RET=$?
rm -f "$PIDFILE"
return $RET
}

case "$1" in
  start)
[ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
do_start
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
  stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
do_stop
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
  restart|force-reload)
log_daemon_msg "Restarting $DESC" "$NAME"
do_stop
case "$?" in
  0|1)
do_start
case "$?" in
  0|1)
do_start
case "$?" in
0) log_end_msg 0 ;;
1) log_end_msg 1 ;; # Old process is still running
*) log_end_msg 1 ;; # Failed to start
esac
;;
  *)
# Failed to stop
log_end_msg 1
;;
esac
;;
  status)
is_running
stat=$?
case "$stat" in
  0) log_success_msg "$DESC is running" ;;
  1) log_failure_msg "could not access pidfile for $DESC" ;;
  *) log_success_msg "$DESC is not 

Re: Cassandra cluster performance

2017-01-09 Thread Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)
Hi we have made some changes to our code and benchmarking and now it seems to 
have the scalability. Async writes plus the changes made the difference. So for 
now, thank you very much everyone for help. Very appreciated.

Branislav


From: Jonathan Haddad 
Reply-To: "user@cassandra.apache.org" 
Date: Sunday, January 8, 2017 at 8:01 PM
To: "user@cassandra.apache.org" 
Cc: Abhishek Kumar Maheshwari 
Subject: Re: Cassandra cluster performance

Can you share your benchmarking code?
On Sun, Jan 8, 2017 at 5:51 PM Branislav Janosik -T (bjanosik - AAP3 INC at 
Cisco) > wrote:

Our test data is just couple of short Strings, load on nodes is just 382 KiB 
and 408 KiB.

I read some articles about async writes and switched from execute to 
execureAsync for the writes. The results seem to be the same (not good), is 
there more that should be done, when doing async writes?


From: Kant Kodali >
Reply-To: "user@cassandra.apache.org" 
>
Date: Friday, January 6, 2017 at 6:05 AM
To: "user@cassandra.apache.org" 
>
Cc: Abhishek Kumar Maheshwari 
>

Subject: Re: Cassandra cluster performance

yeah you should async writes also you cannot neglect data size so you might 
want to let us know what your data size is?



On Thu, Jan 5, 2017 at 2:57 PM, kurt Greaves 
> wrote:
you should try switching to async writes and then perform the test. sync writes 
won't make much difference from a single node but multiple nodes there should 
be a massive difference.

On 4 Jan 2017 10:05, "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" 
> wrote:
Hi,

Our column family definition is

"CREATE TABLE onem2m.cse(" +
"name TEXT PRIMARY KEY," +
"resourceId TEXT," +
")";
"CREATE TABLE IF NOT EXISTS onem2m.AeIdToResourceIdMapping(" +
"cseBaseCseId TEXT," +
"aeId TEXT," +
"resourceId TEXT," +
"PRIMARY KEY ((cseBaseCseId), aeId)" +
")";

"CREATE TABLE IF NOT EXISTS onem2m.Resources_" + i + "(" +
"CONTENT_INSTANCE_OldestId TEXT," +
"CONTENT_INSTANCE_LatestId TEXT," +
"SUBSCRIPTION_OldestId TEXT," +
"SUBSCRIPTION_LatestId TEXT," +
"resourceId TEXT PRIMARY KEY," +
"resourceType TEXT," +
"resourceName TEXT," +
"jsonContent TEXT," +
"parentId TEXT," +
")";

"CREATE TABLE IF NOT EXISTS onem2m.Children_" + i + "(" +
"parentResourceId TEXT," +
"childName TEXT," +
"childResourceId TEXT," +
"nextId TEXT," +
"prevId TEXT," +
"PRIMARY KEY ((parentResourceId), childName)" +
")";



From: Abhishek Kumar Maheshwari 
>
Date: Sunday, December 25, 2016 at 8:54 PM
To: "Branislav Janosik -T (bjanosik - AAP3 INC at Cisco)" 
>
Cc: "user@cassandra.apache.org" 
>
Subject: RE: Cassandra cluster performance

Hi Branislav,


What is your column family definition?


Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Branislav Janosik -T (bjanosik - AAP3 INC at Cisco) 
[mailto:bjano...@cisco.com]
Sent: Thursday, December 22, 2016 6:18 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra cluster performance

Hi,

- Consistency level is set to ONE
-  Keyspace definition:

"CREATE KEYSPACE  IF NOT EXISTS  onem2m " +
"WITH replication = " +
"{ 'class' : 'SimpleStrategy', 'replication_factor' : 1}";



- yes, the client is on separate VM

- In our project we use Cassandra API version 3.0.2 but the database (cluster) 
is version 3.9

- for 2node cluster:

 first VM: 25 GB RAM, 16 CPUs

 second VM: 16 GB RAM, 16 CPUs




From: Ben Slater >
Reply-To: "user@cassandra.apache.org" 
>
Date: Wednesday, December 21, 2016 at 2:32 PM
To: 

Trying to identify the cause of these errors.

2017-01-09 Thread Gopal, Dhruva
My colleague (Richard Ney) has already been in touch with you on a couple of 
other issues we’ve seen in the past. Our development been trying to track down 
some new issues we’ve been seeing on one of our pre-prod environments where 
we’ve been having consistent failures very often (every day or every 2-3 days), 
even if load/number of transactions are very light. We’re running a 2 data 
center deployment with 3 nodes in each data center. Our tables are setup with 
replication factor = 2 and we have 16G dedicated to the heap with the G1GC for 
garbage collection. Our systems are AWS M4.2xlarge with 8 CPUs and 32GB of RAM 
and we have 2 general purpose EBS volumes on each node of 500GB each. Once we 
hit this it seems like the only way to recover is to shutdown the cluster and 
restart. Running repairs after the restart often results in failures and we 
pretty much end up having to truncate the tables before starting up clean 
again. We are not sure if the two are inter-related. We pretty much see the 
same issue on all the nodes. If anyone has any tips or any suggestions on how 
to diagnose this further, it will help a great deal! The issues are:



Issue 1: Once the errors occur they just repeat for a bit followed by the 
errors in issue 2.

INFO  [CompactionExecutor:165] 2017-01-08 08:32:39,915 AutoSavingCache.java:386 
- Saved KeyCache (63 items) in 5 ms
INFO  [IndexSummaryManager:1] 2017-01-08 08:32:41,438 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [HANDSHAKE-ahldataslave4.bos.manhattan.aspect-cloud.net/10.184.8.224] 
2017-01-08 09:30:03,988 OutboundTcpConnection.java:505 - Handshaking version 
with ahldataslave4.bos.manhattan.aspect-cloud.net/10.184.8.224
INFO  [IndexSummaryManager:1] 2017-01-08 09:32:41,440 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
WARN  [SharedPool-Worker-9] 2017-01-08 10:30:00,116 BatchStatement.java:289 - 
Batch of prepared statements for [manhattan.rcmessages] is of size 9264, 
exceeding specified threshold of 5120 by 4144.
INFO  [IndexSummaryManager:1] 2017-01-08 10:32:41,442 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2017-01-08 11:32:41,443 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [CompactionExecutor:162] 2017-01-08 12:32:39,914 AutoSavingCache.java:386 
- Saved KeyCache (108 items) in 4 ms
INFO  [IndexSummaryManager:1] 2017-01-08 12:32:41,446 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2017-01-08 13:32:41,448 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2017-01-08 14:32:41,450 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [IndexSummaryManager:1] 2017-01-08 15:32:41,451 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
INFO  [CompactionExecutor:170] 2017-01-08 16:32:39,915 AutoSavingCache.java:386 
- Saved KeyCache (109 items) in 4 ms
INFO  [IndexSummaryManager:1] 2017-01-08 16:32:41,453 
IndexSummaryRedistribution.java:74 - Redistributing index summaries
WARN  [SharedPool-Worker-4] 2017-01-08 17:30:45,048 
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-4,5,main]: {}
java.lang.AssertionError: null
at org.apache.cassandra.db.rows.BufferCell.(BufferCell.java:49) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:88) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:83) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BufferCell.purge(BufferCell.java:175) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.db.rows.ComplexColumnData.lambda$purge$107(ComplexColumnData.java:165)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:668) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.db.rows.ComplexColumnData.transformAndFilter(ComplexColumnData.java:170)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.db.rows.ComplexColumnData.purge(ComplexColumnData.java:165)
 ~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.db.rows.ComplexColumnData.purge(ComplexColumnData.java:43) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.db.rows.BTreeRow.lambda$purge$102(BTreeRow.java:333) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at 
org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693) 
~[apache-cassandra-3.3.0.jar:3.3.0]
at 

Re: collecting metrics at column-family level

2017-01-09 Thread Jacob Shadix
I found a couple of metrics that should suffice for #2,3 - - > TBL: Live
Disk Used & TBL: Local Write Latency.

   1. count of number of records inserted within given timeframe
   2. data growth
   3. write latency

-- Jacob Shadix

On Mon, Jan 9, 2017 at 2:29 PM, Jacob Shadix  wrote:

> Is it possible to report on the following metrics at table level?
>
>1. count of number of records inserted within given timeframe
>2. data growth
>3. write latency
>
> -- Jacob Shadix
>


collecting metrics at column-family level

2017-01-09 Thread Jacob Shadix
Is it possible to report on the following metrics at table level?

   1. count of number of records inserted within given timeframe
   2. data growth
   3. write latency

-- Jacob Shadix


Re: Help

2017-01-09 Thread Chris Lohfink
Do you have any monitoring setup around garbage collections?  A GC +
network latency > write timeout will cause intermittent hints.

On Sun, Jan 8, 2017 at 10:30 PM, Anshu Vajpayee 
wrote:

> Gossip shows - all nodes are up.
>
> But when  we perform writes , coordinator stores the hints. It means  -
> coordinator was not able to deliver the writes to few nodes after meeting
> consistency requirements.
>
> The nodes for which  writes were failing, are in different DC. Those nodes
> do not have any load.
>
> Gossips shows everything is up.  I already set write timeout to 60 sec,
> but no help.
>
> Can anyone encounter this scenario ? Network side everything is fine.
>
> Cassandra version is 2.1.13
>
> --
> *Regards,*
> *Anshu *
>
>
>


Re: Help

2017-01-09 Thread Edward Capriolo
On Sun, Jan 8, 2017 at 11:30 PM, Anshu Vajpayee 
wrote:

> Gossip shows - all nodes are up.
>
> But when  we perform writes , coordinator stores the hints. It means  -
> coordinator was not able to deliver the writes to few nodes after meeting
> consistency requirements.
>
> The nodes for which  writes were failing, are in different DC. Those nodes
> do not have any load.
>
> Gossips shows everything is up.  I already set write timeout to 60 sec,
> but no help.
>
> Can anyone encounter this scenario ? Network side everything is fine.
>
> Cassandra version is 2.1.13
>
> --
> *Regards,*
> *Anshu *
>
>
>
This suggests you have some intermittent network issues. I would suggest
using query tracing

http://cassandra.apache.org/doc/latest/tools/cqlsh.html

Hopefully you can use that to determine why some operations are failing.


Re: Incremental repair for the first time

2017-01-09 Thread Kathiresan S
Thanks Amit & Oskar

Thanks,
Kathir

On Mon, Jan 9, 2017 at 3:23 AM, Oskar Kjellin 
wrote:

> There is no harm in running it tho. If it's not needed it will simply
> terminate. Better to be safe
>
> Sent from my iPhone
>
> On 9 Jan 2017, at 08:13, Amit Singh F  wrote:
>
> Hi ,
>
>
>
> Generally Upgradesstables are only recommended when you plan to move with
> Major version like  from 2.0 to 2.1  or from 2.1 to 2.2 etc. Since you are
> doing minor version upgrade no need to run upgradesstables utility.
>
>
>
> Link by Datastax might be helpful to you :
>
>
>
> https://support.datastax.com/hc/en-us/articles/208040036-
> Nodetool-upgradesstables-FAQ
>
>
>
> *From:* Kathiresan S [mailto:kathiresanselva...@gmail.com
> ]
> *Sent:* Wednesday, January 04, 2017 12:22 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Incremental repair for the first time
>
>
>
> Thank you!
>
>
>
> We are planning to upgrade to 3.0.10 for this issue.
>
>
>
> From the NEWS txt file (https://github.com/apache/
> cassandra/blob/trunk/NEWS.txt), it looks like there is no need for
> sstableupgrade when we upgrade from 3.0.4 to 3.0.10 (i.e. Just installing
> 3.0.10 Cassandra would suffice and it will work with the sstables created
> by 3.0.4 ?)
>
>
>
> Could you please confirm (if i'm reading the upgrade instructions
> correctly)?
>
>
>
> Thanks,
>
> Kathir
>
>
>
> On Tue, Dec 20, 2016 at 5:28 PM, kurt Greaves 
> wrote:
>
> No workarounds, your best/only option is to upgrade (plus you get the
> benefit of loads of other bug fixes).
>
>
>
> On 16 December 2016 at 21:58, Kathiresan S 
> wrote:
>
> Thank you!
>
>
>
> Is any work around available for this version?
>
>
>
> Thanks,
>
> Kathir
>
>
>
> On Friday, December 16, 2016, Jake Luciani  wrote:
>
> This was fixed post 3.0.4 please upgrade to latest 3.0 release
>
>
>
> On Fri, Dec 16, 2016 at 4:49 PM, Kathiresan S <
> kathiresanselva...@gmail.com> wrote:
>
> Hi,
>
>
>
> We have a brand new Cassandra cluster (version 3.0.4) and we set up
> nodetool repair scheduled for every day (without any options for repair).
> As per documentation, incremental repair is the default in this case.
>
> Should we do a full repair for the very first time on each node once and
> then leave it to do incremental repair afterwards?
>
>
>
> *Problem we are facing:*
>
>
>
> On a random node, the repair process throws validation failed error,
> pointing to some other node
>
>
>
> For Eg. Node A, where the repair is run (without any option), throws below
> error
>
>
>
> *Validation failed in /Node B*
>
>
>
> In Node B when we check the logs, below exception is seen at the same
> exact time...
>
>
>
> *java.lang.RuntimeException: Cannot start multiple repair sessions over
> the same sstables*
>
> *at
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1087)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
>
> *at
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
>
> *at
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:700)
> ~[apache-cassandra-3.0.4.jar:3.0.4]*
>
> *at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_73]*
>
> *at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_73]*
>
>
>
> Can you please help on how this can be fixed?
>
>
>
> Thanks,
>
> Kathir
>
>
>
>
> --
>
> http://twitter.com/tjake
>
>
>
>
>
>


Re: Incremental repair for the first time

2017-01-09 Thread Oskar Kjellin
There is no harm in running it tho. If it's not needed it will simply 
terminate. Better to be safe

Sent from my iPhone

> On 9 Jan 2017, at 08:13, Amit Singh F  wrote:
> 
> Hi ,
>  
> Generally Upgradesstables are only recommended when you plan to move with 
> Major version like  from 2.0 to 2.1  or from 2.1 to 2.2 etc. Since you are 
> doing minor version upgrade no need to run upgradesstables utility.
>  
> Link by Datastax might be helpful to you :
>  
> https://support.datastax.com/hc/en-us/articles/208040036-Nodetool-upgradesstables-FAQ
>  
> From: Kathiresan S [mailto:kathiresanselva...@gmail.com] 
> Sent: Wednesday, January 04, 2017 12:22 AM
> To: user@cassandra.apache.org
> Subject: Re: Incremental repair for the first time
>  
> Thank you!
>  
> We are planning to upgrade to 3.0.10 for this issue.
>  
> From the NEWS txt file 
> (https://github.com/apache/cassandra/blob/trunk/NEWS.txt), it looks like 
> there is no need for sstableupgrade when we upgrade from 3.0.4 to 3.0.10 
> (i.e. Just installing 3.0.10 Cassandra would suffice and it will work with 
> the sstables created by 3.0.4 ?)
>  
> Could you please confirm (if i'm reading the upgrade instructions correctly)?
>  
> Thanks,
> Kathir
>  
> On Tue, Dec 20, 2016 at 5:28 PM, kurt Greaves  wrote:
> No workarounds, your best/only option is to upgrade (plus you get the benefit 
> of loads of other bug fixes).
>  
> On 16 December 2016 at 21:58, Kathiresan S  
> wrote:
> Thank you!
>  
> Is any work around available for this version? 
>  
> Thanks,
> Kathir
> 
> 
> On Friday, December 16, 2016, Jake Luciani  wrote:
> This was fixed post 3.0.4 please upgrade to latest 3.0 release
>  
> On Fri, Dec 16, 2016 at 4:49 PM, Kathiresan S  
> wrote:
> Hi,
>  
> We have a brand new Cassandra cluster (version 3.0.4) and we set up nodetool 
> repair scheduled for every day (without any options for repair). As per 
> documentation, incremental repair is the default in this case. 
> Should we do a full repair for the very first time on each node once and then 
> leave it to do incremental repair afterwards?
>  
> Problem we are facing:
>  
> On a random node, the repair process throws validation failed error, pointing 
> to some other node
>  
> For Eg. Node A, where the repair is run (without any option), throws below 
> error
>  
> Validation failed in /Node B
>  
> In Node B when we check the logs, below exception is seen at the same exact 
> time...
>  
> java.lang.RuntimeException: Cannot start multiple repair sessions over the 
> same sstables
> at 
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1087)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:700)
>  ~[apache-cassandra-3.0.4.jar:3.0.4]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_73]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_73]
>  
> Can you please help on how this can be fixed?
>  
> Thanks,
> Kathir
> 
> 
> 
> --
> http://twitter.com/tjake
>  
>