[jira] [Created] (CASSANDRA-9274) Changing memtable_flush_writes per recommendations in cassandra.yaml causes memtable_cleanup_threshold to be too small

2015-04-30 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-9274:
---

 Summary: Changing memtable_flush_writes per recommendations in 
cassandra.yaml causes  memtable_cleanup_threshold to be too small
 Key: CASSANDRA-9274
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9274
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor


It says in cassandra.yaml:
{noformat}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
{noformat}
so we raised it to 24.

Much later we noticed a warning in the logs:
{noformat}
WARN  [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - 
memtable_cleanup_threshold is set very low, which may cause performance 
degradation
{noformat}
Looking at cassandra.yaml again I see:
{noformat}
# memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
# memtable_cleanup_threshold: 0.11
#memtable_cleanup_threshold: 0.11
{noformat}
So, I uncommented that last line (figuring that 0.11 is a reasonable value).

Cassandra.yaml should give better guidance or the code should *prevent* the 
value from going outside a reasonable range.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9274) Changing memtable_flush_writes per recommendations in cassandra.yaml causes memtable_cleanup_threshold to be too small

2015-04-30 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-9274:

Description: 
It says in cassandra.yaml:
{noformat}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
{noformat}
so we raised it to 24.

Much later we noticed a warning in the logs:
{noformat}
WARN  [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - 
memtable_cleanup_threshold is set very low, which may cause performance 
degradation
{noformat}
Looking at cassandra.yaml again I see:
{noformat}
# memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
# memtable_cleanup_threshold: 0.11
#memtable_cleanup_threshold: 0.11
{noformat}
So, I uncommented that last line (figuring that 0.11 is a reasonable value).

Cassandra.yaml should give better guidance or the code should *prevent* the 
value from going outside a reasonable range.


  was:
It says in cassandra.yaml:
{noformat}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
{noformat}
so we raised it to 24.

Much later we noticed a warning in the logs:
{noformat}
WARN  [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - 
memtable_cleanup_threshold is set very low, which may cause performance 
degradation
{noformat}
Looking at cassandra.yaml again I see:
{noformat}
# memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
# memtable_cleanup_threshold: 0.11
#memtable_cleanup_threshold: 0.11
{noformat}
So, I uncommented that last line (figuring that 0.11 is a reasonable value).

Cassandra.yaml should give better guidance or the code should *prevent* the 
value from going outside a reasonable range.
{noformat}


 Changing memtable_flush_writes per recommendations in cassandra.yaml causes  
 memtable_cleanup_threshold to be too small
 ---

 Key: CASSANDRA-9274
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9274
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 It says in cassandra.yaml:
 {noformat}
 # If your data directories are backed by SSD, you should increase this
 # to the number of cores.
 #memtable_flush_writers: 8
 {noformat}
 so we raised it to 24.
 Much later we noticed a warning in the logs:
 {noformat}
 WARN  [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - 
 memtable_cleanup_threshold is set very low, which may cause performance 
 degradation
 {noformat}
 Looking at cassandra.yaml again I see:
 {noformat}
 # memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1)
 # memtable_cleanup_threshold: 0.11
 #memtable_cleanup_threshold: 0.11
 {noformat}
 So, I uncommented that last line (figuring that 0.11 is a reasonable value).
 Cassandra.yaml should give better guidance or the code should *prevent* the 
 value from going outside a reasonable range.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8990) Allow clients to override the DCs the data gets sent to, per write request, overriding keyspace settings

2015-03-18 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8990:

Description: 
Currently each keyspace specifies how many replicas to write to each data 
center. In CQL one specifies:
{noformat}
 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1: '3',
  'DC2': '3'
}
{noformat}
But in some use cases there's no need to write certain rows to a certain 
datacenter.  Requiring the user to create two keyspaces is burdensome and 
complicates code and queries.  

For example, we have global replication of our data to multiple continents. But 
we want the option to send only certain rows globally with certain values for 
certain columns -- e.g., only for users that visited that country.

Cassandra and CQL should support the ability of client code to specify, on a 
per request basis, that a write should go only to specified data centers 
(probably restricted to being a subset of the DCs specified in the keyspace).


  was:
Currently each keyspace specifies how many replicas to write to each data 
center. In CQL one specifies:
{noformat}
 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1: '3',
  'DC2': '3'
}
{noformat}
But in some use cases there's no need to write certain rows to a certain 
datacenter.  Requiring the user to create two keyspaces is burdensome and 
complicates code and queries.  

For example, we have global replication of our data to multiple continents. But 
we want the option to send only certain rows globally with certain values for 
certain columns -- e.g., only for users that visited that country).

Cassandra and CQL should support the ability of client code to specify, on a 
per request basis, that a write should go only to specified data centers. 



 Allow clients to override the DCs the data gets sent to, per write request, 
 overriding keyspace settings
 

 Key: CASSANDRA-8990
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8990
 Project: Cassandra
  Issue Type: New Feature
Reporter: Donald Smith

 Currently each keyspace specifies how many replicas to write to each data 
 center. In CQL one specifies:
 {noformat}
  WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'DC1: '3',
   'DC2': '3'
 }
 {noformat}
 But in some use cases there's no need to write certain rows to a certain 
 datacenter.  Requiring the user to create two keyspaces is burdensome and 
 complicates code and queries.  
 For example, we have global replication of our data to multiple continents. 
 But we want the option to send only certain rows globally with certain values 
 for certain columns -- e.g., only for users that visited that country.
 Cassandra and CQL should support the ability of client code to specify, on a 
 per request basis, that a write should go only to specified data centers 
 (probably restricted to being a subset of the DCs specified in the keyspace).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8990) Allow clients to override the DCs the data gets sent to, per write request, overriding keyspace settings

2015-03-18 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8990:
---

 Summary: Allow clients to override the DCs the data gets sent to, 
per write request, overriding keyspace settings
 Key: CASSANDRA-8990
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8990
 Project: Cassandra
  Issue Type: New Feature
Reporter: Donald Smith


Currently each keyspace specifies how many replicas to write to each data 
center. In CQL one specifies:
{noformat}
 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1: '3',
  'DC2': '3'
}
{noformat}
But in some use cases there's no need to write certain rows to a certain 
datacenter.  Requiring the user to create two keyspaces is burdensome and 
complicates code and queries.  

For example, we have global replication of our data to multiple continents. But 
we want the option to send only certain rows globally with certain values for 
certain columns -- e.g., only for users that visited that country).

Cassandra and CQL should support the ability of client code to specify, on a 
per request basis, that a write should go only to specified data centers. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8738) /etc/init.d/cassandra stop prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8738:

Description: 
Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *stopped* ]]
! then
!echo OK
! else
!echo ERROR: could not stop the process: $THE_STATUS
! fi
  ;;
--- 69,71 
  sleep 5
! echo OK
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}

  was:
Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *stopped* ]]
! then
!echo OK
! else
!echo ERROR: could not stop the process: $THE_STATUS
! fi
  ;;
--- 69,71 
  sleep 5
! echo OK
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}


 /etc/init.d/cassandra stop prints OK even when it doesn't work
 

 Key: CASSANDRA-8738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
 server is still running.  (This happens, for example, if it's busy doing 
 GCs.)  The current init script prints out OK after sleeping but without 
 checking if the process really stopped. I suggest changing it to:
 {noformat}
 pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
 *** cassandra   2015-02-04 09:15:58.088209988 -0800
 --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
 ***
 *** 69,77 
   sleep 5
 ! THE_STATUS=`$0 status`
 ! if [[ $THE_STATUS == *stopped* ]]
 ! then
 !echo OK
 ! else
 !echo ERROR: could not stop the process: $THE_STATUS
 ! fi
   ;;
 --- 69,71 
   sleep 5
 ! echo OK
   ;;
 {noformat}
 Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
 message like
 {quote}
 ERROR: could not stop the process: cassandra (pid  10764) is running...
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8738) /etc/init.d/cassandra stop prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8738:

Description: 
Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *stopped* ]]
! then
!echo OK
! else
!echo ERROR: could not stop the process: $THE_STATUS
!exit 1
! fi
  ;;
--- 69,71 
  sleep 5
! echo OK
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}

  was:
Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *stopped* ]]
! then
!echo OK
! else
!echo ERROR: could not stop the process: $THE_STATUS
! fi
  ;;
--- 69,71 
  sleep 5
! echo OK
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}


 /etc/init.d/cassandra stop prints OK even when it doesn't work
 

 Key: CASSANDRA-8738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
 server is still running.  (This happens, for example, if it's busy doing 
 GCs.)  The current init script prints out OK after sleeping but without 
 checking if the process really stopped. I suggest changing it to:
 {noformat}
 pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
 *** cassandra   2015-02-04 09:15:58.088209988 -0800
 --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
 ***
 *** 69,77 
   sleep 5
 ! THE_STATUS=`$0 status`
 ! if [[ $THE_STATUS == *stopped* ]]
 ! then
 !echo OK
 ! else
 !echo ERROR: could not stop the process: $THE_STATUS
 !exit 1
 ! fi
   ;;
 --- 69,71 
   sleep 5
 ! echo OK
   ;;
 {noformat}
 Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
 message like
 {quote}
 ERROR: could not stop the process: cassandra (pid  10764) is running...
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8738) /etc/init.d/cassandra stop prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305548#comment-14305548
 ] 

Donald Smith edited comment on CASSANDRA-8738 at 2/4/15 5:36 PM:
-

Here's the change in context:
{noformat}
   stop)
# Cassandra shutdown
echo -n Shutdown Cassandra: 
su $CASSANDRA_OWNR -c kill `cat $pid_file`
for t in `seq 40`; do $0 status  /dev/null 21  sleep 0.5 || break; 
done
# Adding a sleep here to give jmx time to wind down (CASSANDRA-4483). 
Not ideal...
# Adam Holmberg suggests this, but that would break if the jmx port is 
changed
# for t in `seq 40`; do netstat -tnlp | grep 0.0.0.0:7199  /dev/null 
21  sleep 0.1 || break; done
sleep 5
THE_STATUS=`$0 status`
if [[ $THE_STATUS == *stopped* ]]
then
   echo OK
else
   echo ERROR: could not stop the process: $THE_STATUS
   exit 1
fi
;;
{noformat}


was (Author: thinkerfeeler):
Here's the change in context:
{noformat}
   stop)
# Cassandra shutdown
echo -n Shutdown Cassandra: 
su $CASSANDRA_OWNR -c kill `cat $pid_file`
for t in `seq 40`; do $0 status  /dev/null 21  sleep 0.5 || break; 
done
# Adding a sleep here to give jmx time to wind down (CASSANDRA-4483). 
Not ideal...
# Adam Holmberg suggests this, but that would break if the jmx port is 
changed
# for t in `seq 40`; do netstat -tnlp | grep 0.0.0.0:7199  /dev/null 
21  sleep 0.1 || break; done
sleep 5
THE_STATUS=`$0 status`
if [[ $THE_STATUS == *stopped* ]]
then
   echo OK
else
   echo ERROR: could not stop the process: $THE_STATUS
fi
;;
{noformat}

 /etc/init.d/cassandra stop prints OK even when it doesn't work
 

 Key: CASSANDRA-8738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
 server is still running.  (This happens, for example, if it's busy doing 
 GCs.)  The current init script prints out OK after sleeping but without 
 checking if the process really stopped. I suggest changing it to:
 {noformat}
 pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
 *** cassandra   2015-02-04 09:15:58.088209988 -0800
 --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
 ***
 *** 69,77 
   sleep 5
 ! THE_STATUS=`$0 status`
 ! if [[ $THE_STATUS == *stopped* ]]
 ! then
 !echo OK
 ! else
 !echo ERROR: could not stop the process: $THE_STATUS
 !exit 1
 ! fi
   ;;
 --- 69,71 
   sleep 5
 ! echo OK
   ;;
 {noformat}
 Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
 message like
 {quote}
 ERROR: could not stop the process: cassandra (pid  10764) is running...
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8738) /etc/init.d/cassandra stop prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305578#comment-14305578
 ] 

Donald Smith commented on CASSANDRA-8738:
-

I'm using 2.0.11. But I see that 2.1.1 has the same problem. It looks like 
3.0's version at https://github.com/apache/cassandra/blob/trunk/debian/init 
fixes it:
{noformat}
do_stop()
{
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
start-stop-daemon -K -p $PIDFILE -R TERM/30/KILL/5 /dev/null
RET=$?
rm -f $PIDFILE
return $RET
}
{noformat}

 /etc/init.d/cassandra stop prints OK even when it doesn't work
 

 Key: CASSANDRA-8738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
 server is still running.  (This happens, for example, if it's busy doing 
 GCs.)  The current init script prints out OK after sleeping but without 
 checking if the process really stopped. I suggest changing it to:
 {noformat}
 pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
 *** cassandra   2015-02-04 09:15:58.088209988 -0800
 --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
 ***
 *** 69,77 
   sleep 5
 ! THE_STATUS=`$0 status`
 ! if [[ $THE_STATUS == *stopped* ]]
 ! then
 !echo OK
 ! else
 !echo ERROR: could not stop the process: $THE_STATUS
 !exit 1
 ! fi
   ;;
 --- 69,71 
   sleep 5
 ! echo OK
   ;;
 {noformat}
 Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
 message like
 {quote}
 ERROR: could not stop the process: cassandra (pid  10764) is running...
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8738) /etc/init.d/cassandra stop prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8738:
---

 Summary: /etc/init.d/cassandra stop prints OK even when it 
doesn't work
 Key: CASSANDRA-8738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith


Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
server is still running.  (This happens, for example, if it's busy doing GCs.)  
The current init script prints out OK after sleeping but without checking if 
the process really stopped. I suggest changing it to:
{noformat}
pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
*** cassandra   2015-02-04 09:15:58.088209988 -0800
--- cassandra-original  2015-02-04 09:15:40.293767501 -0800
***
*** 69,77 
  sleep 5
! THE_STATUS=`$0 status`
! if [[ $THE_STATUS == *stopped* ]]
! then
!echo OK
! else
!echo ERROR: could not stop the process: $THE_STATUS
! fi
  ;;
--- 69,71 
  sleep 5
! echo OK
  ;;
{noformat}
Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
message like
{quote}
ERROR: could not stop the process: cassandra (pid  10764) is running...
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8738) /etc/init.d/cassandra stop prints OK even when it doesn't work

2015-02-04 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305548#comment-14305548
 ] 

Donald Smith commented on CASSANDRA-8738:
-

Here's the change in context:
{noformat}
   stop)
# Cassandra shutdown
echo -n Shutdown Cassandra: 
su $CASSANDRA_OWNR -c kill `cat $pid_file`
for t in `seq 40`; do $0 status  /dev/null 21  sleep 0.5 || break; 
done
# Adding a sleep here to give jmx time to wind down (CASSANDRA-4483). 
Not ideal...
# Adam Holmberg suggests this, but that would break if the jmx port is 
changed
# for t in `seq 40`; do netstat -tnlp | grep 0.0.0.0:7199  /dev/null 
21  sleep 0.1 || break; done
sleep 5
THE_STATUS=`$0 status`
if [[ $THE_STATUS == *stopped* ]]
then
   echo OK
else
   echo ERROR: could not stop the process: $THE_STATUS
fi
;;
{noformat}

 /etc/init.d/cassandra stop prints OK even when it doesn't work
 

 Key: CASSANDRA-8738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8738
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Sometimes I do {{/etc/init.d/cassandra stop}} and it prints out OK, but the 
 server is still running.  (This happens, for example, if it's busy doing 
 GCs.)  The current init script prints out OK after sleeping but without 
 checking if the process really stopped. I suggest changing it to:
 {noformat}
 pd0-cassandra16 ~ diff -C 1 cassandra cassandra-original
 *** cassandra   2015-02-04 09:15:58.088209988 -0800
 --- cassandra-original  2015-02-04 09:15:40.293767501 -0800
 ***
 *** 69,77 
   sleep 5
 ! THE_STATUS=`$0 status`
 ! if [[ $THE_STATUS == *stopped* ]]
 ! then
 !echo OK
 ! else
 !echo ERROR: could not stop the process: $THE_STATUS
 ! fi
   ;;
 --- 69,71 
   sleep 5
 ! echo OK
   ;;
 {noformat}
 Then it prints out OK only if the stop succeeded. Otherwise it prints out a 
 message like
 {quote}
 ERROR: could not stop the process: cassandra (pid  10764) is running...
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements are low but not zero.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes, due to unable to find sufficient sources for 
streaming range.  But bootstrapping with partial success would be far better 
than being unable to bootstrap at all, and cheaper than a repair. Our 
consistency requirements are low but not zero.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.
 Same with rebuilds.
 We were doing maintenance on some disks and when we started cassandra back 
 up, some nodes ran out of disk space, due to operator miscaluculation. 
 Thereafter, we've been unable to bootstrap new nodes, due to unable to find 
 sufficient sources for streaming range.  But bootstrapping with partial 
 success would be far better than being unable to bootstrap at all, and 
 cheaper than a repair. Our consistency requirements are low but not zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements are low but not zero.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.
 Same with rebuilds.
 We were doing maintenance on some disks, and when we started cassandra back 
 up, some nodes ran out of disk space, due to operator miscalculation. 
 Thereafter, we've been unable to bootstrap new nodes, due to unable to find 
 sufficient sources for streaming range.  But bootstrapping with partial 
 success would be far better than being unable to bootstrap at all, and 
 cheaper than a repair. Our consistency requirements aren't high but we prefer 
 as much consistency as cassandra can give us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements are low but not zero.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements are low but not zero.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.
 Same with rebuilds.
 We were doing maintenance on some disks, and when we started cassandra back 
 up, some nodes ran out of disk space, due to operator miscalculation. 
 Thereafter, we've been unable to bootstrap new nodes, due to unable to find 
 sufficient sources for streaming range.  But bootstrapping with partial 
 success would be far better than being unable to bootstrap at all, and 
 cheaper than a repair. Our consistency requirements are low but not zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.  Faults happen.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.  Faults happen.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.
 Same with rebuilds.
 We were doing maintenance on some disks, and when we started cassandra back 
 up, some nodes ran out of disk space, due to operator miscalculation. 
 Thereafter, we've been unable to bootstrap new nodes, due to unable to find 
 sufficient sources for streaming range.  But bootstrapping with partial 
 success would be far better than being unable to bootstrap at all, and 
 cheaper than a repair. Our consistency requirements aren't high but we prefer 
 as much consistency as cassandra can give us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-09 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271721#comment-14271721
 ] 

Donald Smith commented on CASSANDRA-8494:
-

Tunable consistency is related:  don't fail if a range is missing. Be fault 
tolerant and bootstrap as much as it can.

 incremental bootstrap
 -

 Key: CASSANDRA-8494
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jon Haddad
Assignee: Yuki Morishita
Priority: Minor
  Labels: density
 Fix For: 3.0


 Current bootstrapping involves (to my knowledge) picking tokens and streaming 
 data before the node is available for requests.  This can be problematic with 
 fat nodes, since it may require 20TB of data to be streamed over before the 
 machine can be useful.  This can result in a massive window of time before 
 the machine can do anything useful.
 As a potential approach to mitigate the huge window of time before a node is 
 available, I suggest modifying the bootstrap process to only acquire a single 
 initial token before being marked UP.  This would likely be a configuration 
 parameter incremental_bootstrap or something similar.
 After the node is bootstrapped with this one token, it could go into UP 
 state, and could then acquire additional tokens (one or a handful at a time), 
 which would be streamed over while the node is active and serving requests.  
 The benefit here is that with the default 256 tokens a node could become an 
 active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.  

If it can't find sources for some ranges, it should allow bootstrapping to 
continue and should print out a report about what ranges were missing.   Allow 
the bootstrap to be tunable, under control of parameters (allow up to 100 
failures, for example).

For many apps, it's far better to bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.  Faults happen.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks, and when we started cassandra back up, 
some nodes ran out of disk space, due to operator miscalculation. Thereafter, 
we've been unable to bootstrap new nodes, due to unable to find sufficient 
sources for streaming range.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair. 
Our consistency requirements aren't high but we prefer as much consistency as 
cassandra can give us.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.  
 If it can't find sources for some ranges, it should allow bootstrapping to 
 continue and should print out a report about what ranges were missing.   
 Allow the bootstrap to be tunable, under control of parameters (allow up to 
 100 failures, for example).
 For many apps, it's far better to bootstrap what's available then to fail 
 flat.
 Same with rebuilds.
 We were doing maintenance on some disks, and when we started cassandra back 
 up, some nodes ran out of disk space, due to operator miscalculation. 
 Thereafter, we've been unable to bootstrap new nodes, due to unable to find 
 sufficient sources for streaming range.  But bootstrapping with partial 
 success would be far better than being unable to bootstrap at all, and 
 cheaper than a repair. Our consistency requirements aren't high but we prefer 
 as much consistency as cassandra can give us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.
 Same with rebuilds.
 We were doing maintenance on some disks and when we started back up, some 
 nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
 we've been unable to bootstrap new nodes.  But bootstrapping with partial 
 success would be far better than being unable to bootstrap at all, and 
 cheaper than a repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8591:
---

 Summary: Tunable bootstrapping
 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith


Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters, and should print out a report about what ranges were 
missing.  For many apps, it's far better to bootstrap what's available then to 
fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes, due to unable to find sufficient sources for 
streaming range.  But bootstrapping with partial success would be far better 
than being unable to bootstrap at all, and cheaper than a repair. Our 
consistency requirements are low but not zero.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes, due to unable to find sufficient sources for 
streaming range.  But bootstrapping with partial success would be far better 
than being unable to bootstrap at all, and cheaper than a repair.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.
 Same with rebuilds.
 We were doing maintenance on some disks and when we started back up, some 
 nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
 we've been unable to bootstrap new nodes, due to unable to find sufficient 
 sources for streaming range.  But bootstrapping with partial success would 
 be far better than being unable to bootstrap at all, and cheaper than a 
 repair. Our consistency requirements are low but not zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes, due to unable to find sufficient sources for 
streaming range.  But bootstrapping with partial success would be far better 
than being unable to bootstrap at all, and cheaper than a repair.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

Same with rebuilds.

We were doing maintenance on some disks and when we started back up, some nodes 
ran out of disk space, due to operator miscaluculation. Thereafter, we've been 
unable to bootstrap new nodes.  But bootstrapping with partial success would be 
far better than being unable to bootstrap at all, and cheaper than a repair.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.
 Same with rebuilds.
 We were doing maintenance on some disks and when we started back up, some 
 nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
 we've been unable to bootstrap new nodes, due to unable to find sufficient 
 sources for streaming range.  But bootstrapping with partial success would 
 be far better than being unable to bootstrap at all, and cheaper than a 
 repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters (up to 100 failures, for example), and should print out a 
report about what ranges were missing.  For many apps, it's far better to 
bootstrap what's available then to fail flat.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tuneable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters, and should print out a report about what ranges were 
missing.  For many apps, it's far better to bootstrap what's available then to 
fail flat.


 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tuneable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Summary: Tuneable bootstrapping  (was: Tunable bootstrapping)

 Tuneable bootstrapping
 --

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters, and should print out a report about what ranges 
 were missing.  For many apps, it's far better to bootstrap what's available 
 then to fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Summary: Tunable bootstrapping  (was: Tuneable bootstrapping)

 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tuneable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters, and should print out a report about what ranges 
 were missing.  For many apps, it's far better to bootstrap what's available 
 then to fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tuneable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Description: 
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tuneable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters, and should print out a report about what ranges were 
missing.  For many apps, it's far better to bootstrap what's available then to 
fail flat.

  was:
Often bootstrapping fails due to errors like unable to find sufficient sources 
for streaming range. But cassandra is supposed to be fault tolerant, and it's 
supposed to have tunable consistency.

If it can't find some sources, it should allow bootstrapping to continue, under 
control by parameters, and should print out a report about what ranges were 
missing.  For many apps, it's far better to bootstrap what's available then to 
fail flat.


 Tuneable bootstrapping
 --

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tuneable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters, and should print out a report about what ranges 
 were missing.  For many apps, it's far better to bootstrap what's available 
 then to fail flat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8591) Tunable bootstrapping

2015-01-09 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8591:

Priority: Minor  (was: Major)

 Tunable bootstrapping
 -

 Key: CASSANDRA-8591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8591
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often bootstrapping fails due to errors like unable to find sufficient 
 sources for streaming range. But cassandra is supposed to be fault tolerant, 
 and it's supposed to have tunable consistency.
 If it can't find some sources, it should allow bootstrapping to continue, 
 under control by parameters (up to 100 failures, for example), and should 
 print out a report about what ranges were missing.  For many apps, it's far 
 better to bootstrap what's available then to fail flat.
 Same with rebuilds.
 We were doing maintenance on some disks and when we started back up, some 
 nodes ran out of disk space, due to operator miscaluculation. Thereafter, 
 we've been unable to bootstrap new nodes, due to unable to find sufficient 
 sources for streaming range.  But bootstrapping with partial success would 
 be far better than being unable to bootstrap at all, and cheaper than a 
 repair. Our consistency requirements are low but not zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration

2014-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261238#comment-14261238
 ] 

Donald Smith commented on CASSANDRA-8245:
-

We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

 Cassandra nodes periodically die in 2-DC configuration
 --

 Key: CASSANDRA-8245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Scientific Linux release 6.5
 java version 1.7.0_51
 Cassandra 2.0.9
Reporter: Oleg Poleshuk
Assignee: Brandon Williams
Priority: Minor
 Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, 
 stack5.txt


 We have 2 DCs with 3 nodes in each.
 Second DC periodically has 1-2 nodes down.
 Looks like it looses connectivity with another nodes and then Gossiper starts 
 to accumulate tasks until Cassandra dies with OOM.
 WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting 
 live ratio to maximum of 64.0 instead of Infinity
  WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip 
 stage has 1 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip 
 stage has 4 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip 
 stage has 8 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip 
 stage has 11 pending tasks; skipping status check (no nodes will be marked 
 down)
 ...
 WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip 
 stage has 1014764 pending tasks; skipping status check (no nodes will be 
 marked down)
  WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) 
 Unexpected exception in the selector loop.
 java.lang.OutOfMemoryError: Java heap space
 Also those lines but not sure it is relevant:
 DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) 
 Ignoring interval time of 2085963047



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration

2014-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261238#comment-14261238
 ] 

Donald Smith edited comment on CASSANDRA-8245 at 12/30/14 5:20 PM:
---

We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the 
OutOfMemory exception.  You'd expect it to exit.
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip 
stage has 2695 pending tasks; skipping status check (no nodes will be marked 
down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space

ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}


was (Author: thinkerfeeler):
We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

 Cassandra nodes periodically die in 2-DC configuration
 --

 Key: CASSANDRA-8245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Scientific Linux release 6.5
 java version 1.7.0_51
 Cassandra 2.0.9
Reporter: Oleg Poleshuk
Assignee: Brandon Williams
Priority: Minor
 Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, 
 stack5.txt


 We have 2 DCs with 3 nodes in each.
 Second DC periodically has 1-2 nodes down.
 Looks like it looses connectivity with another nodes and then Gossiper starts 
 to accumulate tasks until Cassandra dies with OOM.
 WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting 
 live ratio to maximum of 64.0 instead of Infinity
  WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip 
 stage has 1 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip 
 stage has 4 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip 
 stage has 8 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip 
 stage has 11 pending tasks; skipping status check (no nodes will be marked 
 down)
 ...
 WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip 
 stage has 1014764 pending tasks; skipping status check (no nodes will be 
 marked down)
  WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) 
 Unexpected exception in the selector loop.
 java.lang.OutOfMemoryError: Java heap space
 Also those lines but not sure it is relevant:
 DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) 
 Ignoring interval time of 2085963047



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8245) Cassandra nodes periodically die in 2-DC configuration

2014-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261238#comment-14261238
 ] 

Donald Smith edited comment on CASSANDRA-8245 at 12/30/14 7:36 PM:
---

We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the 
OutOfMemory exception.  You'd expect it to exit.

Prior to getting OutOfMemory, I notice that such nodes are slow in responding 
to commands and queries (e.g., jmx).
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip 
stage has 2695 pending tasks; skipping status check (no nodes will be marked 
down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space

ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}


was (Author: thinkerfeeler):
We're getting a similar increase in the number of pending Gossip stage tasks, 
followed by OutOfMemory.  This happens once a day or so on some node of our 38 
node DC.   Other nodes have increases in pending Gossip stage tasks but they 
recover.   This is with C* 2.0.11.We have two other DCs. ntpd is running on 
all nodes. But all nodes on one DC are down now.

What's odd is that the cassandra process continues running despite the 
OutOfMemory exception.  You'd expect it to exit.
{noformat}
WARN [GossipTasks:1] 2014-12-26 02:45:06,204 Gossiper.java (line 648) Gossip 
stage has 2695 pending tasks; skipping status check (no nodes will be marked 
down)
ERROR [Thread-49234] 2014-12-26 07:18:42,281 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49234,5,main]
java.lang.OutOfMemoryError: Java heap space

ERROR [Thread-49235] 2014-12-26 07:18:42,291 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-49235,5,main]
java.lang.OutOfMemoryError: Java heap space
...
{noformat}

 Cassandra nodes periodically die in 2-DC configuration
 --

 Key: CASSANDRA-8245
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8245
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Scientific Linux release 6.5
 java version 1.7.0_51
 Cassandra 2.0.9
Reporter: Oleg Poleshuk
Assignee: Brandon Williams
Priority: Minor
 Attachments: stack1.txt, stack2.txt, stack3.txt, stack4.txt, 
 stack5.txt


 We have 2 DCs with 3 nodes in each.
 Second DC periodically has 1-2 nodes down.
 Looks like it looses connectivity with another nodes and then Gossiper starts 
 to accumulate tasks until Cassandra dies with OOM.
 WARN [MemoryMeter:1] 2014-08-12 14:34:59,803 Memtable.java (line 470) setting 
 live ratio to maximum of 64.0 instead of Infinity
  WARN [GossipTasks:1] 2014-08-12 14:44:34,866 Gossiper.java (line 637) Gossip 
 stage has 1 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:35,968 Gossiper.java (line 637) Gossip 
 stage has 4 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:37,070 Gossiper.java (line 637) Gossip 
 stage has 8 pending tasks; skipping status check (no nodes will be marked 
 down)
  WARN [GossipTasks:1] 2014-08-12 14:44:38,171 Gossiper.java (line 637) Gossip 
 stage has 11 pending tasks; skipping status check (no nodes will be marked 
 down)
 ...
 WARN [GossipTasks:1] 2014-10-06 21:42:51,575 Gossiper.java (line 637) Gossip 
 stage has 1014764 pending tasks; skipping status check (no nodes will be 
 marked down)
  WARN [New I/O worker #13] 2014-10-06 21:54:27,010 Slf4JLogger.java (line 76) 
 Unexpected exception in the selector loop.
 java.lang.OutOfMemoryError: Java heap space
 Also those lines but not sure it is relevant:
 DEBUG [GossipStage:1] 2014-08-12 11:33:18,801 FailureDetector.java (line 338) 
 Ignoring interval time of 2085963047



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8433) Add jmx and nodetool controls to reset lifetime metrics to zero

2014-12-08 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238029#comment-14238029
 ] 

Donald Smith commented on CASSANDRA-8433:
-

Maybe I could use the 'recent' metrtics if I knew which ones were 'lifetime' 
and which ones were 'recent'.   Also, how often do the 'recent' metrics reset?  
It doesn't seem to say here: http://wiki.apache.org/cassandra/Metrics .

 Add jmx and nodetool controls to reset lifetime metrics to zero
 ---

 Key: CASSANDRA-8433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often I change some parameter in cassandra, in the OS, or in an external 
 component and want to see the effect on cassandra performance.  Because some 
 the jmx metrics are for the lifetime of the process, it's hard to see the 
 effect of changes.  It's inconvenient to restart all the nodes. And if you 
 restart only some nodes (as I often do) then only those metrics reset to zero.
 The jmx interface should provide a way to reset all lifetime metrics to zero. 
  And *nodetool* should invoke that to allow resetting metrics from the 
 command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8433) Add jmx and nodetool controls to reset lifetime metrics to zero

2014-12-08 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238060#comment-14238060
 ] 

Donald Smith commented on CASSANDRA-8433:
-

Ideally, the output from jmx and nodetool would better document what the fields 
mean and what time period they cover. I's unclear whether some of the latencies 
refer to coordinator node latency for client requests or local disk latency. 

I get the impression that Mean latency is lifetime.  But how do I know? Look 
at the source code? That's something I'd like to reset to zero.

 Add jmx and nodetool controls to reset lifetime metrics to zero
 ---

 Key: CASSANDRA-8433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often I change some parameter in cassandra, in the OS, or in an external 
 component and want to see the effect on cassandra performance.  Because some 
 the jmx metrics are for the lifetime of the process, it's hard to see the 
 effect of changes.  It's inconvenient to restart all the nodes. And if you 
 restart only some nodes (as I often do) then only those metrics reset to zero.
 The jmx interface should provide a way to reset all lifetime metrics to zero. 
  And *nodetool* should invoke that to allow resetting metrics from the 
 command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8433) Add jmx control to reset lifetime metrics to zero

2014-12-05 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8433:
---

 Summary: Add jmx control to reset lifetime metrics to zero
 Key: CASSANDRA-8433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor


Often I change some parameter in cassandra, in the OS, or in an external 
component and want to see the effect on cassandra performance.  Because some 
the jmx metrics are for the lifetime of the process, it's hard to see the 
effect of changes.  It's inconvenient to restart all the nodes. And if you 
restart only some nodes (as I often do) then only those metrics reset to zero.

The jmx interface should provide a way to reset all lifetime metrics to zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8433) Add jmx and mdoetool controls to reset lifetime metrics to zero

2014-12-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8433:

Summary: Add jmx and mdoetool controls to reset lifetime metrics to zero  
(was: Add jmx control to reset lifetime metrics to zero)

 Add jmx and mdoetool controls to reset lifetime metrics to zero
 ---

 Key: CASSANDRA-8433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often I change some parameter in cassandra, in the OS, or in an external 
 component and want to see the effect on cassandra performance.  Because some 
 the jmx metrics are for the lifetime of the process, it's hard to see the 
 effect of changes.  It's inconvenient to restart all the nodes. And if you 
 restart only some nodes (as I often do) then only those metrics reset to zero.
 The jmx interface should provide a way to reset all lifetime metrics to zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8433) Add jmx and mdoetool controls to reset lifetime metrics to zero

2014-12-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8433:

Description: 
Often I change some parameter in cassandra, in the OS, or in an external 
component and want to see the effect on cassandra performance.  Because some 
the jmx metrics are for the lifetime of the process, it's hard to see the 
effect of changes.  It's inconvenient to restart all the nodes. And if you 
restart only some nodes (as I often do) then only those metrics reset to zero.

The jmx interface should provide a way to reset all lifetime metrics to zero.  
And *nodetool* should invoke that to allow resetting metrics from the command 
line.


  was:
Often I change some parameter in cassandra, in the OS, or in an external 
component and want to see the effect on cassandra performance.  Because some 
the jmx metrics are for the lifetime of the process, it's hard to see the 
effect of changes.  It's inconvenient to restart all the nodes. And if you 
restart only some nodes (as I often do) then only those metrics reset to zero.

The jmx interface should provide a way to reset all lifetime metrics to zero.


 Add jmx and mdoetool controls to reset lifetime metrics to zero
 ---

 Key: CASSANDRA-8433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often I change some parameter in cassandra, in the OS, or in an external 
 component and want to see the effect on cassandra performance.  Because some 
 the jmx metrics are for the lifetime of the process, it's hard to see the 
 effect of changes.  It's inconvenient to restart all the nodes. And if you 
 restart only some nodes (as I often do) then only those metrics reset to zero.
 The jmx interface should provide a way to reset all lifetime metrics to zero. 
  And *nodetool* should invoke that to allow resetting metrics from the 
 command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8433) Add jmx and nodetool controls to reset lifetime metrics to zero

2014-12-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8433:

Summary: Add jmx and nodetool controls to reset lifetime metrics to zero  
(was: Add jmx and mdoetool controls to reset lifetime metrics to zero)

 Add jmx and nodetool controls to reset lifetime metrics to zero
 ---

 Key: CASSANDRA-8433
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8433
 Project: Cassandra
  Issue Type: Improvement
Reporter: Donald Smith
Priority: Minor

 Often I change some parameter in cassandra, in the OS, or in an external 
 component and want to see the effect on cassandra performance.  Because some 
 the jmx metrics are for the lifetime of the process, it's hard to see the 
 effect of changes.  It's inconvenient to restart all the nodes. And if you 
 restart only some nodes (as I often do) then only those metrics reset to zero.
 The jmx interface should provide a way to reset all lifetime metrics to zero. 
  And *nodetool* should invoke that to allow resetting metrics from the 
 command line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-18 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216530#comment-14216530
 ] 

Donald Smith commented on CASSANDRA-7830:
-

(BTW, the decommission failed after an hour with Runteime Exception Stream 
failed.  I  tried again and it failed again with the same exception, after 
about an hour and 15 minutes.  There was no load on the cluster at all.   The 
nodes in the dc were all up.   I gave up, stopped the process, and am running 
nodetool removenode ID from another node.)

 Decommissioning fails on a live node
 

 Key: CASSANDRA-7830
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
 Project: Cassandra
  Issue Type: Bug
Reporter: Ananthkumar K S

 {code}Exception in thread main java.lang.UnsupportedOperationException: 
 data is currently moving to this node; unable to leave the ring at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
  at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
  at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
 com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
  at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
 sun.rmi.transport.Transport$1.run(Transport.java:177) at 
 sun.rmi.transport.Transport$1.run(Transport.java:174) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
  at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722){code}
 I got the following exception when i was trying to decommission a live node. 
 There is no reference in the manual saying that i need to stop the data 
 coming into this node. Even then, decommissioning is specified for live nodes.
 Can anyone let me know if am doing something wrong or if this is a bug on 
 cassandra part?
 Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215093#comment-14215093
 ] 

Donald Smith commented on CASSANDRA-7830:
-

Yes, I'm seeing this with 2.0.11:
{noformat}
Exception in thread main java.lang.UnsupportedOperationException: data is 
currently moving to this node; unable to leave the ring
at 
org.apache.cassandra.service.StorageService.decommission(StorageService.java:2912)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
...
{noformat}
And *nodetool netstats* shows:
{noformat}
dc1-cassandra13.dc01 ~ nodetool netstats
Mode: NORMAL
Restore replica count d7efb410-6c58-11e4-896c-a1382b792927
Read Repair Statistics:
Attempted: 1123
Mismatch (Blocking): 0
Mismatch (Background): 540
Pool NameActive   Pending  Completed
Commandsn/a 0 1494743209
Responses   n/a 1 1651558975
{noformat}


 Decommissioning fails on a live node
 

 Key: CASSANDRA-7830
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
 Project: Cassandra
  Issue Type: Bug
Reporter: Ananthkumar K S

 Exception in thread main java.lang.UnsupportedOperationException: data is 
 currently moving to this node; unable to leave the ring at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
  at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
  at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
 com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
  at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
 sun.rmi.transport.Transport$1.run(Transport.java:177) at 
 sun.rmi.transport.Transport$1.run(Transport.java:174) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
  at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722)
 I got the following exception when i was trying to decommission a live node. 
 There is no reference in the manual saying that i need to stop the data 
 coming into this node. Even then, decommissioning is specified for live nodes.
 Can anyone let me know if am doing something wrong or if this is a bug on 
 cassandra part?
 Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215101#comment-14215101
 ] 

Donald Smith commented on CASSANDRA-7830:
-

Stopping and restarting the cassandra process did not help.

 Decommissioning fails on a live node
 

 Key: CASSANDRA-7830
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
 Project: Cassandra
  Issue Type: Bug
Reporter: Ananthkumar K S

 {code}Exception in thread main java.lang.UnsupportedOperationException: 
 data is currently moving to this node; unable to leave the ring at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
  at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
  at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
 com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
  at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
 sun.rmi.transport.Transport$1.run(Transport.java:177) at 
 sun.rmi.transport.Transport$1.run(Transport.java:174) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
  at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722){code}
 I got the following exception when i was trying to decommission a live node. 
 There is no reference in the manual saying that i need to stop the data 
 coming into this node. Even then, decommissioning is specified for live nodes.
 Can anyone let me know if am doing something wrong or if this is a bug on 
 cassandra part?
 Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215101#comment-14215101
 ] 

Donald Smith edited comment on CASSANDRA-7830 at 11/17/14 8:29 PM:
---

Stopping and restarting the cassandra process did not help.

Also, I tried it on two other nodes and it didn't work there either, even when 
I first stopped the process.


was (Author: thinkerfeeler):
Stopping and restarting the cassandra process did not help.

 Decommissioning fails on a live node
 

 Key: CASSANDRA-7830
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
 Project: Cassandra
  Issue Type: Bug
Reporter: Ananthkumar K S

 {code}Exception in thread main java.lang.UnsupportedOperationException: 
 data is currently moving to this node; unable to leave the ring at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
  at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
  at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
 com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
  at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
 sun.rmi.transport.Transport$1.run(Transport.java:177) at 
 sun.rmi.transport.Transport$1.run(Transport.java:174) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
  at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722){code}
 I got the following exception when i was trying to decommission a live node. 
 There is no reference in the manual saying that i need to stop the data 
 coming into this node. Even then, decommissioning is specified for live nodes.
 Can anyone let me know if am doing something wrong or if this is a bug on 
 cassandra part?
 Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215101#comment-14215101
 ] 

Donald Smith edited comment on CASSANDRA-7830 at 11/17/14 8:30 PM:
---

Stopping and restarting the cassandra process did not help.

Also, I tried it on two other nodes and it didn't work there either, even when 
I first stopped and restarted the process.


was (Author: thinkerfeeler):
Stopping and restarting the cassandra process did not help.

Also, I tried it on two other nodes and it didn't work there either, even when 
I first stopped the process.

 Decommissioning fails on a live node
 

 Key: CASSANDRA-7830
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
 Project: Cassandra
  Issue Type: Bug
Reporter: Ananthkumar K S

 {code}Exception in thread main java.lang.UnsupportedOperationException: 
 data is currently moving to this node; unable to leave the ring at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
  at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
  at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
 com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
  at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
 sun.rmi.transport.Transport$1.run(Transport.java:177) at 
 sun.rmi.transport.Transport$1.run(Transport.java:174) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
  at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722){code}
 I got the following exception when i was trying to decommission a live node. 
 There is no reference in the manual saying that i need to stop the data 
 coming into this node. Even then, decommissioning is specified for live nodes.
 Can anyone let me know if am doing something wrong or if this is a bug on 
 cassandra part?
 Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7830) Decommissioning fails on a live node

2014-11-17 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215136#comment-14215136
 ] 

Donald Smith commented on CASSANDRA-7830:
-

Following the advice in 
http://comments.gmane.org/gmane.comp.db.cassandra.user/5554, I stopped all 
nodes and restarted. Now the decommission is working. So this is a workaround.

 Decommissioning fails on a live node
 

 Key: CASSANDRA-7830
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7830
 Project: Cassandra
  Issue Type: Bug
Reporter: Ananthkumar K S

 {code}Exception in thread main java.lang.UnsupportedOperationException: 
 data is currently moving to this node; unable to leave the ring at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:2629)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
  at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
  at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) 
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at 
 com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
  at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:791) at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)
  at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)
  at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)
  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601) at 
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at 
 sun.rmi.transport.Transport$1.run(Transport.java:177) at 
 sun.rmi.transport.Transport$1.run(Transport.java:174) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 sun.rmi.transport.Transport.serviceCall(Transport.java:173) at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556) at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
  at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:722){code}
 I got the following exception when i was trying to decommission a live node. 
 There is no reference in the manual saying that i need to stop the data 
 coming into this node. Even then, decommissioning is specified for live nodes.
 Can anyone let me know if am doing something wrong or if this is a bug on 
 cassandra part?
 Cassandra Version Used : 2.0.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8310) Assertion error in 2.1.1: SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:988)

2014-11-13 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8310:
---

 Summary: Assertion error in 2.1.1: 
SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:988)
 Key: CASSANDRA-8310
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8310
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Donald Smith


Using C* 2.1.1  on linux Centos 6.4, we're getting this AssertionError on 5 
nodes in a 12 node cluster. Also, compactions are lagging on all nodes.
{noformat}
ERROR [IndexSummaryManager:1] 2014-11-13 09:15:16,221 CassandraDaemon.java 
(line 153) Exception in thread Thread[IndexSummaryManager:1,1,main]
java.lang.AssertionError: null
at 
org.apache.cassandra.io.sstable.SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:988)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager.adjustSamplingLevels(IndexSummaryManager.java:420)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:298)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:238)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:139)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:77)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[na:1.7.0_60]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
[na:1.7.0_60]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
 [na:1.7.0_60]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [na:1.7.0_60]
{noformat} 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8311) C* 2.1.1: AssertionError in AbstractionCompactionTask not correctly marked compacting

2014-11-13 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8311:
---

 Summary: C* 2.1.1:  AssertionError in AbstractionCompactionTask 
not correctly marked compacting
 Key: CASSANDRA-8311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8311
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith


Using 2.1.1 on CentOS6.4, we see this AssertionError on 3 out of 12 nodes in 
one DC.
{noformat}
ERROR [CompactionExecutor:7] 2014-11-12 10:15:13,980 CassandraDaemon.java (line 
153) Exception in thread Thread[CompactionExecutor:7,1,RMI Runtime]
java.lang.AssertionError: 
/data/data/KEYSPACE_NAME/TABLE_NAME/KEYSPACE_NAME-TABLE_NAME-jb-308572-Data.db 
is not correctly marked compacting
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.init(AbstractCompactionTask.java:49)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.compaction.CompactionTask.init(CompactionTask.java:62)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.init(LeveledCompactionTask.java:33)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getCompactionTask(LeveledCompactionStrategy.java:170)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8314) C* 2.1.1: AssertionError: stream can only read forward

2014-11-13 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8314:
---

 Summary: C* 2.1.1: AssertionError:  stream can only read forward
 Key: CASSANDRA-8314
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8314
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Donald Smith


I see this multiple nodes on a 2.1.1 cluster running on CentOS 6.4:
{noformat}
ERROR [STREAM-IN-/10.6.1.104] 2014-11-13 14:13:16,565 StreamSession.java (line 
470) [Stream #45bdfe30-6b81-11e4-a7ca-b150b4554347] Streaming error occurred
java.io.IOException: Too many retries for Header (cfId: 
aaefa7d7-9d72-3d18-b5f0-02b30cee5bd7, #29, version: jb, estimated keys: 12672, 
transfer size: 130005779, compressed?: true, repairedAt: 0)
at 
org.apache.cassandra.streaming.StreamSession.doRetry(StreamSession.java:594) 
[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:53)
 [apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38)
 [apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
 [apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)
 [apache-cassandra-2.1.1.jar:2.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60]
Caused by: java.lang.AssertionError: stream can only read forward.
at 
org.apache.cassandra.streaming.compress.CompressedInputStream.position(CompressedInputStream.java:107)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:85)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:48)
 [apache-cassandra-2.1.1.jar:2.1.1]
... 4 common frames omitted
{noformat}

We couldn't upgrade SStables due to exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8314) C* 2.1.1: AssertionError: stream can only read forward

2014-11-13 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211469#comment-14211469
 ] 

Donald Smith commented on CASSANDRA-8314:
-

BTW, earlier we got this exception when run “nodetool upgradesstables”:
{noformat}
java.lang.NullPointerException
at 
org.apache.cassandra.io.sstable.SSTableReader.cloneWithNewStart(SSTableReader.java:951)
at 
org.apache.cassandra.io.sstable.SSTableRewriter.moveStarts(SSTableRewriter.java:238)
at 
org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:180)
at 
org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:109)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:183)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:75)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$4.execute(CompactionManager.java:340)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

 C* 2.1.1: AssertionError:  stream can only read forward
 -

 Key: CASSANDRA-8314
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8314
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Donald Smith

 I see this multiple nodes on a 2.1.1 cluster running on CentOS 6.4:
 {noformat}
 ERROR [STREAM-IN-/10.6.1.104] 2014-11-13 14:13:16,565 StreamSession.java 
 (line 470) [Stream #45bdfe30-6b81-11e4-a7ca-b150b4554347] Streaming error 
 occurred
 java.io.IOException: Too many retries for Header (cfId: 
 aaefa7d7-9d72-3d18-b5f0-02b30cee5bd7, #29, version: jb, estimated keys: 
 12672, transfer size: 130005779, compressed?: true, repairedAt: 0)
 at 
 org.apache.cassandra.streaming.StreamSession.doRetry(StreamSession.java:594) 
 [apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:53)
  [apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38)
  [apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
  [apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)
  [apache-cassandra-2.1.1.jar:2.1.1]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_60]
 Caused by: java.lang.AssertionError: stream can only read forward.
 at 
 org.apache.cassandra.streaming.compress.CompressedInputStream.position(CompressedInputStream.java:107)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:85)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:48)
  [apache-cassandra-2.1.1.jar:2.1.1]
 ... 4 common frames omitted
 {noformat}
 We couldn't upgrade SStables due to exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8060) Geography-aware replication

2014-10-05 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-8060:
---

 Summary: Geography-aware replication
 Key: CASSANDRA-8060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
 Project: Cassandra
  Issue Type: Wish
Reporter: Donald Smith


We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

It would have major ramifications for latency architecture but might be 
appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8060) Geography-aware, daisy-chaining replication

2014-10-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8060:

Summary: Geography-aware, daisy-chaining replication  (was: Geography-aware 
replication)

 Geography-aware, daisy-chaining replication
 ---

 Key: CASSANDRA-8060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
 Project: Cassandra
  Issue Type: Wish
Reporter: Donald Smith

 We have three data centers in the US (CA in California, TX in Texas, and NJ 
 in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
 our writing to CA.  That represents a bottleneck, since the coordinator nodes 
 in CA are responsible for all the replication to every data center.
 Far better if we had the option of setting things up so that CA replicated to 
 TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
 for replicating to UK, which should replicate to DE.  Etc, etc.
 This could be controlled by the topology file.
 It would have major ramifications for latency architecture but might be 
 appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8060) Geography-aware, daisy-chaining replication

2014-10-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8060:

Description: 
We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

It would require architectural changes and would have major ramifications for 
latency but might be appropriate for some scenarios.

  was:
We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

It would have major ramifications for latency architecture but might be 
appropriate for some scenarios.


 Geography-aware, daisy-chaining replication
 ---

 Key: CASSANDRA-8060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
 Project: Cassandra
  Issue Type: Wish
Reporter: Donald Smith

 We have three data centers in the US (CA in California, TX in Texas, and NJ 
 in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
 our writing to CA.  That represents a bottleneck, since the coordinator nodes 
 in CA are responsible for all the replication to every data center.
 Far better if we had the option of setting things up so that CA replicated to 
 TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
 for replicating to UK, which should replicate to DE.  Etc, etc.
 This could be controlled by the topology file.
 It would require architectural changes and would have major ramifications for 
 latency but might be appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8060) Geography-aware, distributed replication

2014-10-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8060:

Summary: Geography-aware, distributed replication  (was: Geography-aware, 
daisy-chaining replication)

 Geography-aware, distributed replication
 

 Key: CASSANDRA-8060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
 Project: Cassandra
  Issue Type: Wish
Reporter: Donald Smith

 We have three data centers in the US (CA in California, TX in Texas, and NJ 
 in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
 our writing to CA.  That represents a bottleneck, since the coordinator nodes 
 in CA are responsible for all the replication to every data center.
 Far better if we had the option of setting things up so that CA replicated to 
 TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
 for replicating to UK, which should replicate to DE.  Etc, etc.
 This could be controlled by the topology file.
 The replication could be organized in a tree-like structure instead of a 
 daisy-chain.
 It would require architectural changes and would have major ramifications for 
 latency but might be appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8060) Geography-aware, daisy-chaining replication

2014-10-05 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-8060:

Description: 
We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

The replication could be organized in a tree-like structure instead of a 
daisy-chain.

It would require architectural changes and would have major ramifications for 
latency but might be appropriate for some scenarios.

  was:
We have three data centers in the US (CA in California, TX in Texas, and NJ in 
NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all our 
writing to CA.  That represents a bottleneck, since the coordinator nodes in CA 
are responsible for all the replication to every data center.

Far better if we had the option of setting things up so that CA replicated to 
TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
for replicating to UK, which should replicate to DE.  Etc, etc.

This could be controlled by the topology file.

It would require architectural changes and would have major ramifications for 
latency but might be appropriate for some scenarios.


 Geography-aware, daisy-chaining replication
 ---

 Key: CASSANDRA-8060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8060
 Project: Cassandra
  Issue Type: Wish
Reporter: Donald Smith

 We have three data centers in the US (CA in California, TX in Texas, and NJ 
 in NJ), two in Europe (UK  and DE), and two in Asia (JP and CH1).  We do all 
 our writing to CA.  That represents a bottleneck, since the coordinator nodes 
 in CA are responsible for all the replication to every data center.
 Far better if we had the option of setting things up so that CA replicated to 
 TX , which replicated to NJ. NJ is closer to UK, so NJ should be responsible 
 for replicating to UK, which should replicate to DE.  Etc, etc.
 This could be controlled by the topology file.
 The replication could be organized in a tree-like structure instead of a 
 daisy-chain.
 It would require architectural changes and would have major ramifications for 
 latency but might be appropriate for some scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6666) Avoid accumulating tombstones after partial hint replay

2014-09-28 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151127#comment-14151127
 ] 

Donald Smith commented on CASSANDRA-:
-

I know this is moot because of the redesign of hints, but I want to understand 
this. OK, if a hint was successfully delivered, then I can see how a tombstone 
would be useful for causing deletion of *older* instances in other sstables.  
But if a hint timed-out (tombstone), then any older instance will also have 
timed out (presumably). So, could tombstones be deleted in that case (timeout)? 
 Perhaps a timed out cell IS a tombstone, but my point is: I don't see why they 
need to take up space.

 Avoid accumulating tombstones after partial hint replay
 ---

 Key: CASSANDRA-
 URL: https://issues.apache.org/jira/browse/CASSANDRA-
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
  Labels: hintedhandoff
 Attachments: .txt, cassandra_system.log.debug.gz






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6666) Avoid accumulating tombstones after partial hint replay

2014-09-19 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141239#comment-14141239
 ] 

Donald Smith commented on CASSANDRA-:
-

Perhaps this question is inappropriate here. But can tombstones be completely 
omitted for system.hints, given that they're not replicated and given that only 
internal code modifies them in normal operation?  If a hint is delivered 
successfully, why does it need a tombstone at all?  If it times out, then 
cassandra is going to give up on delivering it. So, again, why does it need a 
tombstone?   On the cassandra irc channel, several people speculated that 
the cassandra developers didn't want to make a *special case* for system.hints. 
  Also, system.hints has *gc_grace_seconds=0*, so they won't survive a 
compaction, presumably.  I realize that in C* 3.0, tombstones will be moved out 
of tables, but I still am perplexed why tombstones are needed at all for hints. 
  My apologies if this is a dumb question.

 Avoid accumulating tombstones after partial hint replay
 ---

 Key: CASSANDRA-
 URL: https://issues.apache.org/jira/browse/CASSANDRA-
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
  Labels: hintedhandoff
 Fix For: 2.0.11

 Attachments: .txt, cassandra_system.log.debug.gz






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7034) commitlog files are 32MB in size, even with a 64bit OS and jvm

2014-04-15 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969886#comment-13969886
 ] 

Donald Smith commented on CASSANDRA-7034:
-

Benedict, I'm aware that *commitlog_total_space_in_mb* has that purpose.  What 
I'm raising is the issue that this comment in cassandra,yaml is now wrong: the 
default size is 32 on 32-bit JVMs, and 1024 on 64-bit JVMs..  That's no longer 
being enforced.

 commitlog files are 32MB in size, even with a 64bit  OS and jvm
 ---

 Key: CASSANDRA-7034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7034
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith

 We did a rpm install of cassandra 2.0.6 on CentOS 6.4 running 
 {noformat}
  java -version
 Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
 Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
 {noformat}
 That is the version of java CassandraDaemon is using.
 We used the default setting (None) in cassandra.yaml for 
 commitlog_total_space_in_mb:
 {noformat}
 # Total space to use for commitlogs.  Since commitlog segments are
 # mmapped, and hence use up address space, the default size is 32
 # on 32-bit JVMs, and 1024 on 64-bit JVMs.
 #
 # If space gets above this value (it will round up to the next nearest
 # segment multiple), Cassandra will flush every dirty CF in the oldest
 # segment and remove it.  So a small total commitlog space will tend
 # to cause more flush activity on less-active columnfamilies.
 # commitlog_total_space_in_mb: 4096
 {noformat}
 But our commitlog files are 32MB in size, not 1024MB.
 OpsCenter confirms that commitlog_total_space_in_mb is None.
 I don't think the problem is in cassandra-env.sh, because when I run it 
 manually and echo the  values of the version variables I get:
 {noformat}
 jvmver=1.7.0_40
 JVM_VERSION=1.7.0
 JVM_ARCH=64-Bit
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7034) commitlog files are 32MB in size, even with a 64bit OS and jvm

2014-04-14 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-7034:
---

 Summary: commitlog files are 32MB in size, even with a 64bit  OS 
and jvm
 Key: CASSANDRA-7034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7034
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith


We did a rpm install of cassandra 2.0.6 on CentOS 6.4 running 
{noformat}
 java -version
Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)
{noformat}
That is the version of java CassandraDaemon is using.

We used the default setting (None) in cassandra.yaml for 
commitlog_total_space_in_mb:
{noformat}
# Total space to use for commitlogs.  Since commitlog segments are
# mmapped, and hence use up address space, the default size is 32
# on 32-bit JVMs, and 1024 on 64-bit JVMs.
#
# If space gets above this value (it will round up to the next nearest
# segment multiple), Cassandra will flush every dirty CF in the oldest
# segment and remove it.  So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
# commitlog_total_space_in_mb: 4096
{noformat}
But our commitlog files are 32MB in size, not 1024MB.

OpsCenter confirms that commitlog_total_space_in_mb is None.

I don't think the problem is in cassandra-env.sh, because when I run it 
manually and echo the  values of the version variables I get:
{noformat}
jvmver=1.7.0_40
JVM_VERSION=1.7.0
JVM_ARCH=64-Bit
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-6929) Corrupted Index File: read 8599 but expected 8600 chunks.

2014-03-25 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-6929:
---

 Summary: Corrupted Index File: read 8599 but expected 8600 chunks.
 Key: CASSANDRA-6929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6929
 Project: Cassandra
  Issue Type: Bug
Reporter: Donald Smith


I have a 3 node cassandra cluster running 2.0.6 (we started at 2.0.1). It has 
several terabytes of data. We've been seeing exceptions in system.log like 
Corrupted Index File ... read 21478 but expected 21479 chunks.



Here's a stack trace from one server:
{noformat}
 INFO [CompactionExecutor:9109] 2014-03-24 06:55:28,148 ColumnFamilyStore.java 
(line 785) Enqueuing flush of Memtable-compactions_in_progress@1299803435(0/0 
serialized/live bytes, 1 ops)
 INFO [FlushWriter:496] 2014-03-24 06:55:28,148 Memtable.java (line 331) 
Writing Memtable-compactions_in_progress@1299803435(0/0 serialized/live bytes, 
1 ops)
 INFO [FlushWriter:496] 2014-03-24 06:55:28,299 Memtable.java (line 371) 
Completed flushing 
/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12862-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1395195644764, 
position=17842243)
 INFO [CompactionExecutor:9142] 2014-03-24 06:55:28,299 CompactionTask.java 
(line 115) Compacting 
[SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12861-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12860-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12858-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12862-Data.db')]
ERROR [CompactionExecutor:9109] 2014-03-24 06:55:28,302 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:9109,1,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
Corrupted Index File 
/mnt/cassandra-storage/data/as_reports/data_hierarchy_details/as_reports-data_hierarchy_details-jb-55104-CompressionInfo.db:
 read 21478 but expected 21479 chunks.
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:152)
at 
org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:106)
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:64)
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:330)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:204)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Corrupted Index File 
/mnt/cassandra-storage/data/as_reports/data_hierarchy_details/as_reports-data_hierarchy_details-jb-55104-CompressionInfo.db:
 read 21478 but expected 21479 chunks.
... 16 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readLong(Unknown Source)
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:146)
... 15 more
 INFO [CompactionExecutor:9142] 2014-03-24 06:55:28,739 CompactionTask.java 
(line 275) Compacted 4 sstables to 
[/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12863,].
  571 bytes to 42 (~7% of original) in 439ms = 0.91MB/s.  4 total 
partitions merged to 1.  Partition merge counts were {2:2, }
{noformat}
Here's another example:



{noformat}
 INFO [CompactionExecutor:9566] 2014-03-25 06:32:02,234 ColumnFamilyStore.java 
(line 785) Enqueuing flush of Memtable-compactions_in_progress@1216289160(0/0 
serialized/live bytes, 1 ops)
 INFO [FlushWriter:474] 2014-03-25 06:32:02,234 Memtable.java (line 

[jira] [Updated] (CASSANDRA-6929) Corrupted Index File: read 8599 but expected 8600 chunks.

2014-03-25 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-6929:


Description: 
I have a 3 node cassandra cluster running 2.0.6 (we started at 2.0.1). It has 
several terabytes of data. We've been seeing exceptions in system.log like 
Corrupted Index File ... read 21478 but expected 21479 chunks.



Here's a stack trace from one server:
{noformat}
 INFO [CompactionExecutor:9109] 2014-03-24 06:55:28,148 ColumnFamilyStore.java 
(line 785) Enqueuing flush of Memtable-compactions_in_progress@1299803435(0/0 
serialized/live bytes, 1 ops)
 INFO [FlushWriter:496] 2014-03-24 06:55:28,148 Memtable.java (line 331) 
Writing Memtable-compactions_in_progress@1299803435(0/0 serialized/live bytes, 
1 ops)
 INFO [FlushWriter:496] 2014-03-24 06:55:28,299 Memtable.java (line 371) 
Completed flushing 
/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12862-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1395195644764, 
position=17842243)
 INFO [CompactionExecutor:9142] 2014-03-24 06:55:28,299 CompactionTask.java 
(line 115) Compacting 
[SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12861-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12860-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12858-Data.db'),
 
SSTableReader(path='/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12862-Data.db')]
ERROR [CompactionExecutor:9109] 2014-03-24 06:55:28,302 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:9109,1,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
Corrupted Index File 
/mnt/cassandra-storage/data/as_reports/data_hierarchy_details/as_reports-data_hierarchy_details-jb-55104-CompressionInfo.db:
 read 21478 but expected 21479 chunks.
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:152)
at 
org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:106)
at 
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:64)
at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:330)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:204)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Corrupted Index File 
/mnt/cassandra-storage/data/as_reports/data_hierarchy_details/as_reports-data_hierarchy_details-jb-55104-CompressionInfo.db:
 read 21478 but expected 21479 chunks.
... 16 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(Unknown Source)
at java.io.DataInputStream.readLong(Unknown Source)
at 
org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:146)
... 15 more
 INFO [CompactionExecutor:9142] 2014-03-24 06:55:28,739 CompactionTask.java 
(line 275) Compacted 4 sstables to 
[/mnt/cassandra-storage/data/system/compactions_in_progress/system-compactions_in_progress-jb-12863,].
  571 bytes to 42 (~7% of original) in 439ms = 0.91MB/s.  4 total 
partitions merged to 1.  Partition merge counts were {2:2, }
{noformat}
Here's another example:



{noformat}
 INFO [CompactionExecutor:9566] 2014-03-25 06:32:02,234 ColumnFamilyStore.java 
(line 785) Enqueuing flush of Memtable-compactions_in_progress@1216289160(0/0 
serialized/live bytes, 1 ops)
 INFO [FlushWriter:474] 2014-03-25 06:32:02,234 Memtable.java (line 331) 
Writing Memtable-compactions_in_progress@1216289160(0/0 serialized/live bytes, 
1 ops)
 INFO [FlushWriter:474] 2014-03-25 06:32:02,445 

[jira] [Created] (CASSANDRA-6611) Allow for FINAL ttls and FINAL (immutable) inserts to eliminate the need for tombstones

2014-01-22 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-6611:
---

 Summary: Allow for FINAL ttls and FINAL (immutable) inserts to 
eliminate the need for tombstones
 Key: CASSANDRA-6611
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6611
 Project: Cassandra
  Issue Type: New Feature
Reporter: Donald Smith


Suppose you're not allowed to update the TTL of a column (cell) -- either 
because CQL is extended to allow syntax like USING *FINAL* TTL 86400 or 
because there were a table option saying that TTL is immutable.

If you never update the TTL of a column, then there should be no need for 
tombstones at all:  any replicas will have the same TTL.  So there’d be no risk 
of missed deletes.  You wouldn’t even need GCable tombstones.  The purpose of a 
tombstone is to cover the case where a different node was down and it didn’t 
notice the delete and it still had the column and tried to replicate it back; 
but that won’t happen if it too had the TTL.

So, if – and it’s a big if – a table disallowed updates to TTL, then you could 
really optimize deletion of TTLed columns: you could do away with tombstones 
entirely.   If a table allows updates to TTL then it’s possible a different 
node will have the row without the TTL and the tombstone would be needed.

Or am I missing something?

Disallowing updates to rows would seem to enable optimizations in general.   
Write-once, non-updatable rows are a common use case. If cassandra had FINAL 
tables (or FINAL INSERTS) then it could eliminate tombstones for those too. 
Probably other optimizations would be enabled too.






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6611) Allow for FINAL ttls and FINAL (immutable) inserts to eliminate the need for tombstones

2014-01-22 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879340#comment-13879340
 ] 

Donald Smith commented on CASSANDRA-6611:
-

Would it be better, then, to enforce this at the schema level, in a CREATE 
TABLE statement?

 Allow for FINAL ttls and FINAL (immutable) inserts to eliminate the need for 
 tombstones
 ---

 Key: CASSANDRA-6611
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6611
 Project: Cassandra
  Issue Type: New Feature
Reporter: Donald Smith

 Suppose you're not allowed to update the TTL of a column (cell) -- either 
 because CQL is extended to allow syntax like USING *FINAL* TTL 86400 or 
 because there were a table option saying that TTL is immutable.
 If you never update the TTL of a column, then there should be no need for 
 tombstones at all:  any replicas will have the same TTL.  So there’d be no 
 risk of missed deletes.  You wouldn’t even need GCable tombstones.  The 
 purpose of a tombstone is to cover the case where a different node was down 
 and it didn’t notice the delete and it still had the column and tried to 
 replicate it back; but that won’t happen if it too had the TTL.
 So, if – and it’s a big if – a table disallowed updates to TTL, then you 
 could really optimize deletion of TTLed columns: you could do away with 
 tombstones entirely.   If a table allows updates to TTL then it’s possible a 
 different node will have the row without the TTL and the tombstone would be 
 needed.
 Or am I missing something?
 Disallowing updates to rows would seem to enable optimizations in general.   
 Write-once, non-updatable rows are a common use case. If cassandra had FINAL 
 tables (or FINAL INSERTS) then it could eliminate tombstones for those too. 
 Probably other optimizations would be enabled too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6611) Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the need for tombstones

2014-01-22 Thread Donald Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donald Smith updated CASSANDRA-6611:


Summary: Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the 
need for tombstones  (was: Allow for FINAL ttls and FINAL (immutable) inserts 
to eliminate the need for tombstones)

 Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the need for 
 tombstones
 -

 Key: CASSANDRA-6611
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6611
 Project: Cassandra
  Issue Type: New Feature
Reporter: Donald Smith

 Suppose you're not allowed to update the TTL of a column (cell) -- either 
 because CQL is extended to allow syntax like USING *FINAL* TTL 86400 or 
 because there were a table option saying that TTL is immutable.
 If you never update the TTL of a column, then there should be no need for 
 tombstones at all:  any replicas will have the same TTL.  So there’d be no 
 risk of missed deletes.  You wouldn’t even need GCable tombstones.  The 
 purpose of a tombstone is to cover the case where a different node was down 
 and it didn’t notice the delete and it still had the column and tried to 
 replicate it back; but that won’t happen if it too had the TTL.
 So, if – and it’s a big if – a table disallowed updates to TTL, then you 
 could really optimize deletion of TTLed columns: you could do away with 
 tombstones entirely.   If a table allows updates to TTL then it’s possible a 
 different node will have the row without the TTL and the tombstone would be 
 needed.
 Or am I missing something?
 Disallowing updates to rows would seem to enable optimizations in general.   
 Write-once, non-updatable rows are a common use case. If cassandra had FINAL 
 tables (or FINAL INSERTS) then it could eliminate tombstones for those too. 
 Probably other optimizations would be enabled too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6611) Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the need for tombstones

2014-01-22 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879464#comment-13879464
 ] 

Donald Smith commented on CASSANDRA-6611:
-

I see. But setting gc_grace_seconds to zero will affect deletes other than TTL 
expirations, won't it? So, I want something in the TABLE declaration that 
states this more declaratively.

 Allow for FINAL ttls and FINAL inserts or TABLEs to eliminate the need for 
 tombstones
 -

 Key: CASSANDRA-6611
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6611
 Project: Cassandra
  Issue Type: New Feature
Reporter: Donald Smith

 Suppose you're not allowed to update the TTL of a column (cell) -- either 
 because CQL is extended to allow syntax like USING *FINAL* TTL 86400 or 
 because there were a table option saying that TTL is immutable.
 If you never update the TTL of a column, then there should be no need for 
 tombstones at all:  any replicas will have the same TTL.  So there’d be no 
 risk of missed deletes.  You wouldn’t even need GCable tombstones.  The 
 purpose of a tombstone is to cover the case where a different node was down 
 and it didn’t notice the delete and it still had the column and tried to 
 replicate it back; but that won’t happen if it too had the TTL.
 So, if – and it’s a big if – a table disallowed updates to TTL, then you 
 could really optimize deletion of TTLed columns: you could do away with 
 tombstones entirely.   If a table allows updates to TTL then it’s possible a 
 different node will have the row without the TTL and the tombstone would be 
 needed.
 Or am I missing something?
 Disallowing updates to rows would seem to enable optimizations in general.   
 Write-once, non-updatable rows are a common use case. If cassandra had FINAL 
 tables (or FINAL INSERTS) then it could eliminate tombstones for those too. 
 Probably other optimizations would be enabled too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6586) Cassandra touches all columns on CQL3 select

2014-01-15 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872555#comment-13872555
 ] 

Donald Smith commented on CASSANDRA-6586:
-

To clarify (and correct me if I'm wrong), this ticket does *not* imply that all 
physical thrift columns (cells) of a physical thrift row (partition) are read 
when you do a CQL select on a CQL primary key. It just means that all columns 
mentioned in the CQL primary key are read. There's still a lot of confusion 
between thrift terminology and CQL terminology.

You can still have wide rows and you still can avoid reading all the (physical 
thrift) columns of that row.

People are still confused by terminology and this scares them unnecessarily.  
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows 
explains the terminology to use, but it's still unclear.

 Cassandra touches all columns on CQL3 select
 

 Key: CASSANDRA-6586
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6586
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jan Chochol
Priority: Minor

 It seems that Cassandra is checking (garbage collecting) all columns of all 
 returned rows, despite the fact that not all columns are requested.
 Example:
 * use following script to fill Cassandra with test data:
 {noformat}
 perl -e print(\DROP KEYSPACE t;\nCREATE KEYSPACE t WITH replication = 
 {'class': 'SimpleStrategy', 'replication_factor' : 1};\nuse t;\nCREATE TABLE 
 t (a varchar PRIMARY KEY, b varchar, c varchar, d varchar);\nCREATE INDEX t_b 
 ON t (b);\nCREATE INDEX t_c ON t (c);\nCREATE INDEX t_d ON t (d);\n\);\$max 
 = 200; for(\$i = 0; \$i  \$max; \$i++) { \$j = int(\$i * 10 / \$max); \$k = 
 int(\$i * 100 / \$max); print(\INSERT INTO t (a, b, c, d) VALUES ('a\$i', 
 'b\$j', 'c\$k', 'd\$i');\n\)}\n | cqlsh
 {noformat}
 * turn on {{ALL}} logging for Cassandra
 * issue this query:
 {noformat}
 select a from t where c = 'c1';
 {noformat}
 This is result:
 {noformat}
 [root@jch3-devel:~/c4] cqlsh --no-color
 Connected to C4 Cluster Single at localhost:9160.
 [cqlsh 3.1.7 | Cassandra 1.2.11-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 
 19.36.1]
 Use HELP for help.
 cqlsh use t;
 cqlsh:t select a from t where c = 'c1';
  a
 
  a3
  a2
 {noformat}
 From Cassandra log:
 {noformat}
 2014-01-15 09:14:56.663+0100 [Thrift:1] [TRACE] QueryProcessor.java(125) 
 org.apache.cassandra.cql3.QueryProcessor: component=c4 Process 
 org.apache.cassandra.cql3.statements.SelectStatement@614b3189 @CL.ONE
 2014-01-15 09:14:56.810+0100 [Thrift:1] [TRACE] ReadCallback.java(67) 
 org.apache.cassandra.service.ReadCallback: component=c4 Blockfor is 1; 
 setting up requests to /127.0.0.1
 2014-01-15 09:14:56.816+0100 [ReadStage:2] [DEBUG] 
 CompositesSearcher.java(112) 
 org.apache.cassandra.db.index.composites.CompositesSearcher: component=c4 
 Most-selective indexed predicate is 't.c EQ c1'
 2014-01-15 09:14:56.817+0100 [ReadStage:2] [TRACE] 
 ColumnFamilyStore.java(1493) org.apache.cassandra.db.ColumnFamilyStore: 
 component=c4 Filtering 
 org.apache.cassandra.db.index.composites.CompositesSearcher$1@e15911 for rows 
 matching 
 org.apache.cassandra.db.filter.ExtendedFilter$FilterWithCompositeClauses@4a9e6b8a
 2014-01-15 09:14:56.817+0100 [ReadStage:2] [TRACE] 
 CompositesSearcher.java(237) 
 org.apache.cassandra.db.index.composites.CompositesSearcher: component=c4 
 Scanning index 't.c EQ c1' starting with 
 2014-01-15 09:14:56.820+0100 [ReadStage:2] [TRACE] SSTableReader.java(776) 
 org.apache.cassandra.io.sstable.SSTableReader: component=c4 Adding cache 
 entry for KeyCacheKey(/mnt/ebs/cassandra/data/t/t/t-t.t_c-ic-1, 6331) - 
 org.apache.cassandra.db.RowIndexEntry@66a6574b
 2014-01-15 09:14:56.821+0100 [ReadStage:2] [TRACE] SliceQueryFilter.java(164) 
 org.apache.cassandra.db.filter.SliceQueryFilter: component=c4 collecting 0 of 
 1: 6133:false:0@1389773577394000
 2014-01-15 09:14:56.821+0100 [ReadStage:2] [TRACE] SliceQueryFilter.java(164) 
 org.apache.cassandra.db.filter.SliceQueryFilter: component=c4 collecting 1 of 
 1: 6132:false:0@1389773577391000
 2014-01-15 09:14:56.822+0100 [ReadStage:2] [TRACE] 
 CompositesSearcher.java(313) 
 org.apache.cassandra.db.index.composites.CompositesSearcher: component=c4 
 Adding index hit to current row for 6133
 2014-01-15 09:14:56.825+0100 [ReadStage:2] [TRACE] SSTableReader.java(776) 
 org.apache.cassandra.io.sstable.SSTableReader: component=c4 Adding cache 
 entry for KeyCacheKey(/mnt/ebs/cassandra/data/t/t/t-t-ic-1, 6133) - 
 org.apache.cassandra.db.RowIndexEntry@32ad3193
 2014-01-15 09:14:56.826+0100 [ReadStage:2] [TRACE] SliceQueryFilter.java(164) 
 org.apache.cassandra.db.filter.SliceQueryFilter: component=c4 collecting 0 of 
 2147483647: :false:0@1389773577394000
 

[jira] [Commented] (CASSANDRA-5396) Repair process is a joke leading to a downward spiralling and eventually unusable cluster

2013-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858966#comment-13858966
 ] 

Donald Smith commented on CASSANDRA-5396:
-

We ran nodetool repair -pr on one node of a three node cluster running on 
production-quality hardware, each node with about 1TB of data. It was using 
cassandra version 2.0.3. After 5 days it was still running and had apparently 
frozen.  See https://issues.apache.org/jira/browse/CASSANDRA-5220 (Dec 23 
comment by Donald Smith) for more detail.  We tried running repair on our 
smallest column family (with 12G of data), and it took 31 hours to complete.
We're not yet in production but we plan on not running repair, since we do very 
few deletes or updates and since we don't trust it.

 Repair process is a joke leading to a downward spiralling and eventually 
 unusable cluster
 -

 Key: CASSANDRA-5396
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5396
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.3
 Environment: all
Reporter: David Berkman
Priority: Critical

 Let's review the repair process...
 1) It's mandatory to run repair.
 2) Repair has a high impact and can take hours.
 3) Repair provides no estimation of completion time and no progress indicator.
 4) Repair is extremely fragile, and can fail to complete, or become stuck 
 quite easily in real operating environments.
 5) When repair fails it provides no feedback whatsoever of the problem or 
 possible resolution.
 6) A failed repair operation saddles the effected nodes with a huge amount of 
 extra data (judging from node size).
 7) There is no way to rid the node of the extra data associated with a failed 
 repair short of completely rebuilding the node.
 8) The extra data from a failed repair makes any subsequent repair take 
 longer and increases the likelihood that it will simply become stuck or fail, 
 leading to yet more node corruption.
 9) Eventually no repair operation will complete successfully, and node 
 operations will eventually become impacted leading to a failing cluster.
 Who would design such a system for a service meant to operate as a fault 
 tolerant clustered data store operating on a lot of commodity hardware?
 Solution...
 1) Repair must be robust.
 2) Repair must *never* become 'stuck'.
 3) Failure to complete must result in reasonable feedback.
 4) Failure to complete must not result in a node whose state is worse than 
 before the operation began.
 5) Repair must provide some means of determining completion percentage.
 6) It would be nice if repair could estimate its run time, even if it could 
 do so only based upon previous runs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (CASSANDRA-5396) Repair process is a joke leading to a downward spiralling and eventually unusable cluster

2013-12-30 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13858966#comment-13858966
 ] 

Donald Smith edited comment on CASSANDRA-5396 at 12/30/13 6:14 PM:
---

We ran nodetool repair -pr on one node of a three node cluster running on 
production-quality hardware, each node with about 1TB of data. It was using 
cassandra version 2.0.3. After 5 days it was still running and had apparently 
frozen.  See https://issues.apache.org/jira/browse/CASSANDRA-5220 (Dec 23 
comment by Donald Smith) for more detail.  We tried running repair on our 
smallest column family (with 12G of data), and it took 31 hours to complete.
We're not yet in production but we plan on not running repair, since we do very 
few deletes or updates and since we don't trust it. Also, our data isn't 
critical.


was (Author: thinkerfeeler):
We ran nodetool repair -pr on one node of a three node cluster running on 
production-quality hardware, each node with about 1TB of data. It was using 
cassandra version 2.0.3. After 5 days it was still running and had apparently 
frozen.  See https://issues.apache.org/jira/browse/CASSANDRA-5220 (Dec 23 
comment by Donald Smith) for more detail.  We tried running repair on our 
smallest column family (with 12G of data), and it took 31 hours to complete.
We're not yet in production but we plan on not running repair, since we do very 
few deletes or updates and since we don't trust it.

 Repair process is a joke leading to a downward spiralling and eventually 
 unusable cluster
 -

 Key: CASSANDRA-5396
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5396
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.3
 Environment: all
Reporter: David Berkman
Priority: Critical

 Let's review the repair process...
 1) It's mandatory to run repair.
 2) Repair has a high impact and can take hours.
 3) Repair provides no estimation of completion time and no progress indicator.
 4) Repair is extremely fragile, and can fail to complete, or become stuck 
 quite easily in real operating environments.
 5) When repair fails it provides no feedback whatsoever of the problem or 
 possible resolution.
 6) A failed repair operation saddles the effected nodes with a huge amount of 
 extra data (judging from node size).
 7) There is no way to rid the node of the extra data associated with a failed 
 repair short of completely rebuilding the node.
 8) The extra data from a failed repair makes any subsequent repair take 
 longer and increases the likelihood that it will simply become stuck or fail, 
 leading to yet more node corruption.
 9) Eventually no repair operation will complete successfully, and node 
 operations will eventually become impacted leading to a failing cluster.
 Who would design such a system for a service meant to operate as a fault 
 tolerant clustered data store operating on a lot of commodity hardware?
 Solution...
 1) Repair must be robust.
 2) Repair must *never* become 'stuck'.
 3) Failure to complete must result in reasonable feedback.
 4) Failure to complete must not result in a node whose state is worse than 
 before the operation began.
 5) Repair must provide some means of determining completion percentage.
 6) It would be nice if repair could estimate its run time, even if it could 
 do so only based upon previous runs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855774#comment-13855774
 ] 

Donald Smith commented on CASSANDRA-5220:
-

 We ran nodetool repair on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of nodetool tpstats:
{noformat}
cass3 /tmp nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column family, as reported by nodetool cfstats:

[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855778#comment-13855778
 ] 

Donald Smith commented on CASSANDRA-5351:
-

As reported in https://issues.apache.org/jira/browse/CASSANDRA-5220, we ran 
nodetool repair on a 3-node cassandra v. 2.0.3 cluster using production 
hardware, on realistic test data. Each node has about 1TB of data. After over 
five days, the repair job is still running.

 Avoid repairing already-repaired data by default
 

 Key: CASSANDRA-5351
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Lyuben Todorov
  Labels: repair
 Fix For: 2.1


 Repair has always built its merkle tree from all the data in a columnfamily, 
 which is guaranteed to work but is inefficient.
 We can improve this by remembering which sstables have already been 
 successfully repaired, and only repairing sstables new since the last repair. 
  (This automatically makes CASSANDRA-3362 much less of a problem too.)
 The tricky part is, compaction will (if not taught otherwise) mix repaired 
 data together with non-repaired.  So we should segregate unrepaired sstables 
 from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 6:21 PM:
---

 We ran nodetool repair on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of nodetool tpstats:
{noformat}
cass3 /tmp nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column 

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:22 PM:
---

 We ran nodetool repair on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of nodetool tpstats:
{noformat}
cass3 /tmp nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column 

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:26 PM:
---

 We ran nodetool repair on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of nodetool tpstats:
{noformat}
cass3 /tmp nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column 

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:30 PM:
---

 We ran nodetool repair on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of nodetool tpstats:
{noformat}
cass3 /tmp nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column 

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:49 PM:
---

 We ran nodetool repair on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of nodetool tpstats:
{noformat}
cass3 /tmp nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest column 

[jira] [Commented] (CASSANDRA-6215) Possible space leak in datastax.driver.core

2013-10-18 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799228#comment-13799228
 ] 

Donald Smith commented on CASSANDRA-6215:
-

Created https://datastax-oss.atlassian.net/browse/JAVA-201

 Possible space leak in datastax.driver.core
 ---

 Key: CASSANDRA-6215
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6215
 Project: Cassandra
  Issue Type: Bug
  Components: Drivers (now out of tree)
 Environment: CentOS 6.4
Reporter: Donald Smith

 I wrote a java benchmark app that uses CQL cassandra-driver-core:1.0.3  and 
 repeatedly saves to column families using code like:
 {noformat}
final Insert writeReportInfo = QueryBuilder.insertInto(KEYSPACE_NAME, 
 REPORT_INFO_TABLE_NAME).value(type,report.type.toString()).value(...) ...
 m_session.execute(writeReportInfo);
 {noformat}
 After running for about an hour, with -Xmx2000m, and writing about 20,000 
 reports (each with about 1 rows), it got: java.lang.OutOfMemoryError: 
 Java heap space.
 Using jmap and jhat I can see that the objects taking up space are 
 {noformat}
  Instance Counts for All Classes (excluding platform)
 1657280 instances of class 
 com.datastax.driver.core.ColumnDefinitions$Definition
 31628 instances of class com.datastax.driver.core.ColumnDefinitions
 31628 instances of class 
 [Lcom.datastax.driver.core.ColumnDefinitions$Definition;
 31627 instances of class com.datastax.driver.core.PreparedStatement
 31627 instances of class org.apache.cassandra.utils.MD5Digest 
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (CASSANDRA-6215) Possible space leak in datastax.driver.core

2013-10-17 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-6215:
---

 Summary: Possible space leak in datastax.driver.core
 Key: CASSANDRA-6215
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6215
 Project: Cassandra
  Issue Type: Bug
  Components: Drivers (now out of tree)
 Environment: CentOS 6.4
Reporter: Donald Smith


I wrote a java benchmark app that uses CQL cassandra-driver-core:1.0.3  and 
repeatedly saves to column families using code like:
{noformat}
   final Insert writeReportInfo = QueryBuilder.insertInto(KEYSPACE_NAME, 
REPORT_INFO_TABLE_NAME).value(type,report.type.toString()).value(...) ...

m_session.execute(writeReportInfo);
{noformat}
After running for about an hour, with -Xmx2000m, and writing about 20,000 
reports (each with about 1 rows), it got: java.lang.OutOfMemoryError: Java 
heap space.

Using jmap and jhat I can see that the objects taking up space are 
{noformat}
 Instance Counts for All Classes (excluding platform)
1657280 instances of class com.datastax.driver.core.ColumnDefinitions$Definition
31628 instances of class com.datastax.driver.core.ColumnDefinitions
31628 instances of class 
[Lcom.datastax.driver.core.ColumnDefinitions$Definition;
31627 instances of class com.datastax.driver.core.PreparedStatement
31627 instances of class org.apache.cassandra.utils.MD5Digest 
...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-11 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792877#comment-13792877
 ] 

Donald Smith commented on CASSANDRA-6152:
-

No, the test suite does *not* drop and create new tables (i.e., it does not 
call DROP TABLE and CREATE TABLE).  It deletes tables and re-inserts. I'm 
working right now on submitting a focused example that reproduces the bug.

 Assertion error in 2.0.1 at 
 db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 ---

 Key: CASSANDRA-6152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 6.2 (Final)
 With default set up on single node.
 I also saw this exception in 2.0.0 on a three node cluster.
Reporter: Donald Smith

 {noformat}
 ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
 at 
 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
 at 
 org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-11 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792877#comment-13792877
 ] 

Donald Smith edited comment on CASSANDRA-6152 at 10/11/13 5:56 PM:
---

No, the test suite does *not* drop and create new tables (i.e., it does not 
call DROP TABLE and CREATE TABLE).  It deletes rows from tables and 
re-inserts. I'm working right now on submitting a focused example that 
reproduces the bug.


was (Author: thinkerfeeler):
No, the test suite does *not* drop and create new tables (i.e., it does not 
call DROP TABLE and CREATE TABLE).  It deletes tables and re-inserts. I'm 
working right now on submitting a focused example that reproduces the bug.

 Assertion error in 2.0.1 at 
 db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 ---

 Key: CASSANDRA-6152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 6.2 (Final)
 With default set up on single node.
 I also saw this exception in 2.0.0 on a three node cluster.
Reporter: Donald Smith

 {noformat}
 ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
 at 
 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
 at 
 org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-11 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793025#comment-13793025
 ] 

Donald Smith commented on CASSANDRA-6152:
-

I found a *simple* example of the bug.  

If I insert an empty string () into the table it causes the AssertionError. 
If I insert a non-empty string there's no AssertionError!

{noformat}
create keyspace if not exists bug with replication = {'class':'SimpleStrategy', 
'replication_factor':1};


create table if not exists bug.bug_table ( -- compact; column values are 
ordered by item_name
report_id   uuid,
item_name   text,
item_value  text,
primary key (report_id, item_name)) with compact storage;
}
{noformat}

BugMain.java:
{noformat}
package bug;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class BugMain {
private static String CASSANDRA_HOST = 
System.getProperty(cassandraServer,172.17.1.169); 
//donalds01lx.uscorp.audsci.com;
private static BugInterface dao = new BugImpl(CASSANDRA_HOST);

public static void bug() throws IOException {
ListBugItem items = new ArrayListBugItem();
items.add(new BugItem(,1,2,3));   // if you change the empty string 
 to a non-empty string, the AssertionError goes away!
items.add(new BugItem(twp,2,2,3));
items.add(new BugItem(three,3,2,3));
items.add(new BugItem(four,4,2,3));
dao.saveReport(items);
}

public static void main(String [] args) throws IOException { 
   try {
   for(int i=0;i1000;i++) {
   System.out.println(\ndas: iteration  + i + \n);
   bug();
   }
   } finally {
   dao.shutdown();
   }
}
}
{noformat}

BugItem.java:
{noformat}
package bug;

public class BugItem {
public String name;
public long long1; 
public long long2;
public long long3; 
public BugItem(String string, long i, long j, long k) {
name=string;
long1 = i;
long2= j;
long3 = k;
}
public String toString() {return Item with name =  + name + , long1 =  
+ long1 + , long2 =  + long2 + , long3 =  + long3;}
}
{noformat}

BugInterface.java:
{noformat}
package bug;

import java.util.List;


public interface BugInterface {
public static final String VALUE_DELIMITER = :;
public static final String HIERARCHY_DELIMITER =   ;
void saveReport(ListBugItem item);

void connect();
void shutdown();
}
{noformat}

BugImpl.java:
{noformat}
package bug;

import java.text.NumberFormat;
import java.util.List;
import java.util.UUID;

import org.apache.log4j.Logger;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.PreparedStatement;
import com.datastax.driver.core.Session;
import com.datastax.driver.core.querybuilder.Insert;
import com.datastax.driver.core.querybuilder.QueryBuilder;

public class BugImpl implements BugInterface {
private static final String CASSANDRA_NODE_PROPERTY=CASSANDRA_NODE;
private static final Logger L = Logger.getLogger(new Throwable()
.getStackTrace()[0].getClassName());
private static final String KEYSPACE_NAME = bug;
private static final String REPORT_DATA_TABLE_NAME = bug_table;
private static NumberFormat numberFormat = NumberFormat.getInstance();
private Cluster m_cluster;
private Session m_session;
private int m_writeBatchSize = 64;
private String m_cassandraNode = your cassandra hostname here;

static {
numberFormat.setMaximumFractionDigits(1);
}

public BugImpl() {
m_cassandraNode=System.getProperty(CASSANDRA_NODE_PROPERTY, 
m_cassandraNode); // Get from command line
}
public BugImpl(String cassandraNode) {
m_cassandraNode=cassandraNode;
}
@Override
public void shutdown() {
if (m_session!=null) {m_session.shutdown();}
if (m_cluster!=null) {m_cluster.shutdown();}
}
@Override
public void connect() {
 m_cluster = 
Cluster.builder().addContactPoint(m_cassandraNode).build();
 m_session = m_cluster.connect();
}
// 
-
@Override
public void saveReport(ListBugItem items) {
final long time1 = System.currentTimeMillis();
if (m_session==null) {
connect();
}
UUID reportId = UUID.randomUUID(); 
saveReportAux(items,reportId);
final long time2 = System.currentTimeMillis();
L.info(saveReport: t= + 
numberFormat.format((double)(time2-time1) * 0.001) +  seconds);
}


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-11 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793070#comment-13793070
 ] 

Donald Smith commented on CASSANDRA-6152:
-

I have a hunch that when the column name is  and the Memtable flushes to an 
SSTable is when this bug bites.   I notice it happens at about the same 
iteration of the *for* loop in BugMain.java.

 Assertion error in 2.0.1 at 
 db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 ---

 Key: CASSANDRA-6152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 6.2 (Final)
 With default set up on single node.
 I also saw this exception in 2.0.0 on a three node cluster.
Reporter: Donald Smith
Assignee: Sylvain Lebresne
 Fix For: 2.0.2


 {noformat}
 ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
 at 
 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
 at 
 org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-09 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790717#comment-13790717
 ] 

Donald Smith commented on CASSANDRA-6152:
-

The test just runs our test suite repeatedly. After a few runs it gets the 
following, with TRACE level. I'll document more later.
{noformat}
DEBUG [Native-Transport-Requests:11] 2013-10-09 11:40:49,459 Message.java (line 
302) Received: PREPARE INSERT INTO 
as_reports.data_report_details(report_id,item_name,item_value) VALUES 
(7bc2a570-a42b-4632-b245-f0db9255ccc3,?,?);, v=1
TRACE [Native-Transport-Requests:11] 2013-10-09 11:40:49,460 
QueryProcessor.java (line 208) Stored prepared statement 
ef4c655e042ffab2c1f0eef1e53a573e with 2 bind markers
DEBUG [Native-Transport-Requests:11] 2013-10-09 11:40:49,460 Tracing.java (line 
157) request complete
DEBUG [Native-Transport-Requests:11] 2013-10-09 11:40:49,460 Message.java (line 
309) Responding: RESULT PREPARED ef4c655e042ffab2c1f0eef1e53a573e 
[item_name(as_reports, data_report_details), 
org.apache.cassandra.db.marshal.UTF8Type][item_value(as_reports, 
data_report_details), org.apache.cassandra.db.marshal.UTF8Type] 
(resultMetadata=[0 columns]), v=1
DEBUG [Native-Transport-Requests:13] 2013-10-09 11:40:49,464 Message.java (line 
302) Received: EXECUTE ef4c655e042ffab2c1f0eef1e53a573e with 2 values at 
consistency ONE, v=1
TRACE [Native-Transport-Requests:13] 2013-10-09 11:40:49,464 
QueryProcessor.java (line 232) [1] 'java.nio.HeapByteBuffer[pos=0 lim=0 cap=0]'
TRACE [Native-Transport-Requests:13] 2013-10-09 11:40:49,464 
QueryProcessor.java (line 232) [2] 'java.nio.HeapByteBuffer[pos=36 lim=41 
cap=43]'
TRACE [Native-Transport-Requests:13] 2013-10-09 11:40:49,464 
QueryProcessor.java (line 97) Process 
org.apache.cassandra.cql3.statements.UpdateStatement@321baa4a @CL.ONE
ERROR [COMMIT-LOG-WRITER] 2013-10-09 11:40:49,465 CassandraDaemon.java (line 
185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
at 
org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
at 
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:722)
DEBUG [Native-Transport-Requests:13] 2013-10-09 11:40:49,466 Tracing.java (line 
157) request complete
DEBUG [Native-Transport-Requests:13] 2013-10-09 11:40:49,466 Message.java (line 
309) Responding: EMPTY RESULT, v=1
{noformat}

 Assertion error in 2.0.1 at 
 db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 ---

 Key: CASSANDRA-6152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 6.2 (Final)
 With default set up on single node.
 I also saw this exception in 2.0.0 on a three node cluster.
Reporter: Donald Smith

 {noformat}
 ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
 at 
 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
 at 
 org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-06 Thread Donald Smith (JIRA)
Donald Smith created CASSANDRA-6152:
---

 Summary: Assertion error in 2.0.1 at 
db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 Key: CASSANDRA-6152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 6.2 (Final)
With default set up on single node.
I also saw this exception in 2.0.0 on a three node cluster.
Reporter: Donald Smith



{noformat}
ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
at 
org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
at 
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:722)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-06 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787772#comment-13787772
 ] 

Donald Smith commented on CASSANDRA-6152:
-

The exception seems to happen first during a delete.  Let me know if you need 
more info.

 Assertion error in 2.0.1 at 
 db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 ---

 Key: CASSANDRA-6152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 6.2 (Final)
 With default set up on single node.
 I also saw this exception in 2.0.0 on a three node cluster.
Reporter: Donald Smith

 {noformat}
 ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
 at 
 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
 at 
 org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6152) Assertion error in 2.0.1 at db.ColumnSerializer.serialize(ColumnSerializer.java:56)

2013-10-06 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787808#comment-13787808
 ] 

Donald Smith commented on CASSANDRA-6152:
-

I was running a functional test suite, which populates some tables after 
deleting the old rows for the same keys. 

I ran it by a command like:
{noformat}
repeat 10 ./run-test.sh 
{noformat}
So, it was deleting and writing rows in quick succession.  

If you want to see more detail than that, I'll see what I can provide.


 Assertion error in 2.0.1 at 
 db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 ---

 Key: CASSANDRA-6152
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6152
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS release 6.2 (Final)
 With default set up on single node.
 I also saw this exception in 2.0.0 on a three node cluster.
Reporter: Donald Smith

 {noformat}
 ERROR [COMMIT-LOG-WRITER] 2013-10-06 12:12:36,845 CassandraDaemon.java (line 
 185) Exception in thread Thread[COMMIT-LOG-WRITER,5,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:56)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.serialize(ColumnFamilySerializer.java:77)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:268)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:229)
 at 
 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:352)
 at 
 org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:48)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)