[jira] [Updated] (CASSANDRA-3636) cassandra 1.0.6 debian packages will not run on OpenVZ

2011-12-15 Thread Zenek Kraweznik (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zenek Kraweznik updated CASSANDRA-3636:
---

Description: 
During upgrade from 1.0.6
{code}Setting up cassandra (1.0.6) ...
*error: permission denied on key 'vm.max_map_count'*
dpkg: error processing cassandra (--configure):
 subprocess installed post-installation script returned error exit status 255
Errors were encountered while processing:
 cassandra
{code}

  was:
During upgrade from 1.0.6
Setting up cassandra (1.0.6) ...
*error: permission denied on key 'vm.max_map_count'*
dpkg: error processing cassandra (--configure):
 subprocess installed post-installation script returned error exit status 255
Errors were encountered while processing:
 cassandra



 cassandra 1.0.6 debian packages will not run on OpenVZ
 --

 Key: CASSANDRA-3636
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3636
 Project: Cassandra
  Issue Type: Bug
  Components: Packaging
Affects Versions: 1.0.6
 Environment: Debian Linux (stable), OpenVZ container
Reporter: Zenek Kraweznik
Priority: Critical

 During upgrade from 1.0.6
 {code}Setting up cassandra (1.0.6) ...
 *error: permission denied on key 'vm.max_map_count'*
 dpkg: error processing cassandra (--configure):
  subprocess installed post-installation script returned error exit status 255
 Errors were encountered while processing:
  cassandra
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3625) Do something about DynamicCompositeType

2011-12-15 Thread Boris Yen (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170057#comment-13170057
 ] 

Boris Yen edited comment on CASSANDRA-3625 at 12/15/11 9:07 AM:


Not sure if this is doable or has any issue. 

I am thinking why not mimic that way the secondary index is implemented right 
now. Create one extra column family for keeping track of the comparators for 
each row. So, whenever a column is inserted to the cassandra, the cassandra 
needs to read before write to make sure the new column is valid. This would 
sacrifice the write performance of dynamicComposite column family, but at least 
it allows the cassandra to perform the validation before the actual write.

  was (Author: yulinyen):
Not sure if this is doable or has an issue. 

I am thinking why not mimic that way the secondary index is implemented right 
now. Create one extra column family for keeping track of the comparators for 
each row. So, whenever a column is inserted to the cassandra, the cassandra 
needs to read before write to make sure the new column is valid. This would 
sacrifice the write performance of dynamicComposite column family, but at least 
it allows the cassandra to perform the validation before the actual write.
  
 Do something about DynamicCompositeType
 ---

 Key: CASSANDRA-3625
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3625
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne

 Currently, DynamicCompositeType is a super dangerous type. We cannot leave it 
 that way or people will get hurt.
 Let's recall that DynamicCompositeType allows composite column names without 
 any limitation on what each component type can be. It was added to basically 
 allow to use different rows of the same column family to each store a 
 different index. So for instance you would have:
 {noformat}
 index1: {
   bar:24 - someval
   bar:42 - someval
   foo:12 - someval
   ...
 }
 index2: {
   0:uuid1:3.2 - someval
   1:uuid2:2.2 - someval
   ...
 }
 
 {noformat}
 where index1, index2, ... are rows.
 So each row have columns whose names have similar structure (so they can be 
 compared), but between rows the structure can be different (we neve compare 
 two columns from two different rows).
 But the problem is the following: what happens if in the index1 row above, 
 you insert a column whose name is 0:uuid1 ? There is no really meaningful way 
 to compare bar:24 and 0:uuid1. The current implementation of 
 DynamicCompositeType, when confronted with this, says that it is a user error 
 and throw a MarshalException.
 The problem with that is that the exception is not throw at insert time, and 
 it *cannot* be because of the dynamic nature of the comparator. But that 
 means that if you do insert the wrong column in the wrong row, you end up 
 *corrupting* a sstable.
 It is too dangerous a behavior. And it's probably made worst by the fact that 
 some people probably think that DynamicCompositeType should be superior to 
 CompositeType since you know, it's dynamic.
 One solution to that problem could be to decide of some random (but 
 predictable) order between two incomparable component. For example we could 
 design that IntType  LongType  StringType ...
 Note that even if we do that, I would suggest renaming the 
 DynamicCompositeType to something that suggest that CompositeType is always 
 preferable to DynamicCompositeType unless you're really doing very advanced 
 stuffs.
 Opinions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3625) Do something about DynamicCompositeType

2011-12-15 Thread Boris Yen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170057#comment-13170057
 ] 

Boris Yen commented on CASSANDRA-3625:
--

Not sure if this is doable or has an issue. 

I am thinking why not mimic that way the secondary index is implemented right 
now. Create one extra column family for keeping track of the comparators for 
each row. So, whenever a column is inserted to the cassandra, the cassandra 
needs to read before write to make sure the new column is valid. This would 
sacrifice the write performance of dynamicComposite column family, but at least 
it allows the cassandra to perform the validation before the actual write.

 Do something about DynamicCompositeType
 ---

 Key: CASSANDRA-3625
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3625
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne

 Currently, DynamicCompositeType is a super dangerous type. We cannot leave it 
 that way or people will get hurt.
 Let's recall that DynamicCompositeType allows composite column names without 
 any limitation on what each component type can be. It was added to basically 
 allow to use different rows of the same column family to each store a 
 different index. So for instance you would have:
 {noformat}
 index1: {
   bar:24 - someval
   bar:42 - someval
   foo:12 - someval
   ...
 }
 index2: {
   0:uuid1:3.2 - someval
   1:uuid2:2.2 - someval
   ...
 }
 
 {noformat}
 where index1, index2, ... are rows.
 So each row have columns whose names have similar structure (so they can be 
 compared), but between rows the structure can be different (we neve compare 
 two columns from two different rows).
 But the problem is the following: what happens if in the index1 row above, 
 you insert a column whose name is 0:uuid1 ? There is no really meaningful way 
 to compare bar:24 and 0:uuid1. The current implementation of 
 DynamicCompositeType, when confronted with this, says that it is a user error 
 and throw a MarshalException.
 The problem with that is that the exception is not throw at insert time, and 
 it *cannot* be because of the dynamic nature of the comparator. But that 
 means that if you do insert the wrong column in the wrong row, you end up 
 *corrupting* a sstable.
 It is too dangerous a behavior. And it's probably made worst by the fact that 
 some people probably think that DynamicCompositeType should be superior to 
 CompositeType since you know, it's dynamic.
 One solution to that problem could be to decide of some random (but 
 predictable) order between two incomparable component. For example we could 
 design that IntType  LongType  StringType ...
 Note that even if we do that, I would suggest renaming the 
 DynamicCompositeType to something that suggest that CompositeType is always 
 preferable to DynamicCompositeType unless you're really doing very advanced 
 stuffs.
 Opinions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3615) CommitLog BufferOverflowException

2011-12-15 Thread Vitalii Tymchyshyn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170092#comment-13170092
 ] 

Vitalii Tymchyshyn commented on CASSANDRA-3615:
---

I've got similar problem on 1.0.5:


ERROR [COMMIT-LOG-WRITER] 2011-12-13 21:11:57,004 AbstractCassandraDaemon.java 
(line 133) Fatal exception in thread Thread[COMMIT-LOG-WRITER,5,main]
java.nio.BufferOverflowException
at java.nio.Buffer.nextPutIndex(Buffer.java:518)
at java.nio.DirectByteBuffer.putInt(DirectByteBuffer.java:664)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:244)
at 
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:567)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:679)

It seems that there is an incosistency between space checker and  actual write
org.apache.cassandra.db.commitlog.CommitLogSegment#hasCapacityFor checks for 
serialized length + ENTRY_OVERHEAD_SIZE = 4 + 8 + 8

At the same time, write also writes END_OF_SEGMENT_MARKER int, so 
ENTRY_OVERHEAD_SIZE should be 4+8+8+4


 CommitLog BufferOverflowException
 -

 Key: CASSANDRA-3615
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3615
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1
Reporter: Rick Branson
Assignee: Rick Branson

 Reported on mailing list 
 http://mail-archives.apache.org/mod_mbox/cassandra-dev/201112.mbox/%3CCAJHHpg2Rw_BWFJ9DycRGSYkmwMwrJDK3%3Dzw3HwRoutWHbUcULw%40mail.gmail.com%3E
 ERROR 14:07:31,215 Fatal exception in thread
 Thread[COMMIT-LOG-WRITER,5,main]
 java.nio.BufferOverflowException
 at java.nio.Buffer.nextPutIndex(Buffer.java:501)
 at java.nio.DirectByteBuffer.putInt(DirectByteBuffer.java:654)
 at
 org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:259)
 at
 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:568)
 at
 org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Thread.java:662)
  INFO 14:07:31,504 flushing high-traffic column family CFS(Keyspace='***',
 ColumnFamily='***') (estimated 103394287 bytes)
 It happened during a fairly standard load process using M/R.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and scheduled repairs

2011-12-15 Thread Dominic Williams (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Williams updated CASSANDRA-3620:


Summary: Proposal for distributed deletes - fully automatic Reaper Model 
rather than GCSeconds and scheduled repairs  (was: Proposal for distributed 
deletes - use Reaper Model rather than GCSeconds and scheduled repairs)

 Proposal for distributed deletes - fully automatic Reaper Model rather than 
 GCSeconds and scheduled repairs
 -

 Key: CASSANDRA-3620
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dominic Williams
  Labels: GCSeconds,, deletes,, distributed_deletes,, 
 merkle_trees, repair,
   Original Estimate: 504h
  Remaining Estimate: 504h

 Here is a proposal for an improved system for handling distributed deletes.
 h2. The Problem
 There are various issues with repair:
 * Repair is expensive anyway
 * Repair jobs are often made more expensive than they should be by other 
 issues (nodes dropping requests, hinted handoff not working, downtime etc)
 * Repair processes can often fail and need restarting, for example in cloud 
 environments where network issues make a node disappear 
 from the ring for a brief moment
 * When you fail to run repair within GCSeconds, either by error or because of 
 issues with Cassandra, data written to a node that did not see a later delete 
 can reappear (and a node might miss a delete for several reasons including 
 being down or simply dropping requests during load shedding)
 * If you cannot run repair and have to increase GCSeconds to prevent deleted 
 data reappearing, in some cases the growing tombstone overhead can 
 significantly degrade performance
 Because of the foregoing, in high throughput environments it can be very 
 difficult to make repair a cron job. It can be preferable to keep a terminal 
 open and run repair jobs one by one, making sure they succeed and keeping and 
 eye on overall load to reduce system impact. This isn't desirable, and 
 problems are exacerbated when there are lots of column families in a database 
 or it is necessary to run a column family with a low GCSeconds to reduce 
 tombstone load (because there are many write/deletes to that column family). 
 The database owner must run repair within the GCSeconds window, or increase 
 GCSeconds, to avoid potentially losing delete operations. 
 It would be much better if there was no ongoing requirement to run repair to 
 ensure deletes aren't lost, and no GCSeconds window. Ideally repair would be 
 an optional maintenance utility used in special cases, or to ensure ONE reads 
 get consistent data. 
 h2. Reaper Model Proposal
 # Tombstones do not expire, and there is no GCSeconds
 # Tombstones have associated ACK lists, which record the replicas that have 
 acknowledged them
 # Tombstones are only deleted (or marked for compaction) when they have been 
 acknowledged by all replicas
 # When a tombstone is deleted, it is added to a fast relic index of MD5 
 hashes of cf-key-name[-subName]-ackList. The relic index makes it possible 
 for a reaper to acknowledge a tombstone after it is deleted
 # Background reaper threads constantly stream ACK requests to other nodes, 
 and stream back ACK responses back to requests they have received (throttling 
 their usage of CPU and bandwidth so as not to affect performance)
 # If a reaper receives a request to ACK a tombstone that does not exist, it 
 creates the tombstone and adds an ACK for the requestor, and replies with an 
 ACK 
 NOTES
 * The existence of entries in the relic index do not affect normal query 
 performance
 * If a node goes down, and comes up after a configurable relic entry timeout, 
 the worst that can happen is that a tombstone that hasn't received all its 
 acknowledgements is re-created across the replicas when the reaper requests 
 their acknowledgements (which is no big deal since this does not corrupt data)
 * Since early removal of entries in the relic index does not cause 
 corruption, it can be kept small, or even kept in memory
 * Simple to implement and predictable 
 h3. Planned Benefits
 * Operations are finely grained (reaper interruption is not an issue)
 * The labour  administration overhead associated with running repair can be 
 removed
 * Reapers can utilize spare cycles and run constantly in background to 
 prevent the load spikes and performance issues associated with repair
 * There will no longer be the threat of corruption if repair can't be run for 
 some reason (for example because of a new adopter's lack of Cassandra 
 expertise, a cron script failing, or Cassandra bugs preventing repair being 
 run etc)
 * 

[jira] [Updated] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs

2011-12-15 Thread Dominic Williams (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Williams updated CASSANDRA-3620:


Description: 
Proposal for an improved system for handling distributed deletes, which removes 
the requirement to run repair regular processes to maintain performance and 
data integrity. 

h2. The Problem

There are various issues with repair:

* Repair is expensive anyway
* Repair jobs are often made more expensive than they should be by other issues 
(nodes dropping requests, hinted handoff not working, downtime etc)
* Repair processes can often fail and need restarting, for example in cloud 
environments where network issues make a node disappear 
from the ring for a brief moment
* When you fail to run repair within GCSeconds, either by error or because of 
issues with Cassandra, data written to a node that did not see a later delete 
can reappear (and a node might miss a delete for several reasons including 
being down or simply dropping requests during load shedding)
* If you cannot run repair and have to increase GCSeconds to prevent deleted 
data reappearing, in some cases the growing tombstone overhead can 
significantly degrade performance

Because of the foregoing, in high throughput environments it can be very 
difficult to make repair a cron job. It can be preferable to keep a terminal 
open and run repair jobs one by one, making sure they succeed and keeping and 
eye on overall load to reduce system impact. This isn't desirable, and problems 
are exacerbated when there are lots of column families in a database or it is 
necessary to run a column family with a low GCSeconds to reduce tombstone load 
(because there are many write/deletes to that column family). The database 
owner must run repair within the GCSeconds window, or increase GCSeconds, to 
avoid potentially losing delete operations. 

It would be much better if there was no ongoing requirement to run repair to 
ensure deletes aren't lost, and no GCSeconds window. Ideally repair would be an 
optional maintenance utility used in special cases, or to ensure ONE reads get 
consistent data. 

h2. Reaper Model Proposal

# Tombstones do not expire, and there is no GCSeconds
# Tombstones have associated ACK lists, which record the replicas that have 
acknowledged them
# Tombstones are only deleted (or marked for compaction) when they have been 
acknowledged by all replicas
# When a tombstone is deleted, it is added to a fast relic index of MD5 
hashes of cf-key-name[-subName]-ackList. The relic index makes it possible for 
a reaper to acknowledge a tombstone after it is deleted
# Background reaper threads constantly stream ACK requests to other nodes, 
and stream back ACK responses back to requests they have received (throttling 
their usage of CPU and bandwidth so as not to affect performance)
# If a reaper receives a request to ACK a tombstone that does not exist, it 
creates the tombstone and adds an ACK for the requestor, and replies with an 
ACK 

NOTES

* The existence of entries in the relic index do not affect normal query 
performance
* If a node goes down, and comes up after a configurable relic entry timeout, 
the worst that can happen is that a tombstone that hasn't received all its 
acknowledgements is re-created across the replicas when the reaper requests 
their acknowledgements (which is no big deal since this does not corrupt data)
* Since early removal of entries in the relic index does not cause corruption, 
it can be kept small, or even kept in memory
* Simple to implement and predictable 

h3. Planned Benefits

* Operations are finely grained (reaper interruption is not an issue)
* The labour  administration overhead associated with running repair can be 
removed
* Reapers can utilize spare cycles and run constantly in background to 
prevent the load spikes and performance issues associated with repair
* There will no longer be the threat of corruption if repair can't be run for 
some reason (for example because of a new adopter's lack of Cassandra 
expertise, a cron script failing, or Cassandra bugs preventing repair being run 
etc)
* Deleting tombstones earlier, thereby reducing the number involved in query 
processing, will often dramatically improve performance



  was:
Here is a proposal for an improved system for handling distributed deletes.

h2. The Problem

There are various issues with repair:

* Repair is expensive anyway
* Repair jobs are often made more expensive than they should be by other issues 
(nodes dropping requests, hinted handoff not working, downtime etc)
* Repair processes can often fail and need restarting, for example in cloud 
environments where network issues make a node disappear 
from the ring for a brief moment
* When you fail to run repair within GCSeconds, either by error or because of 
issues with Cassandra, data written to a node that did not see a later delete 
can 

[jira] [Created] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra

2011-12-15 Thread MaHaiyang (Created) (JIRA)
It may iterate the whole memtable while just query one row . This seriously 
affect the  performance . of Cassandra
--

 Key: CASSANDRA-3638
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3638
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: MaHaiyang


RangeSliceVerbHandler may  just only query one row , but cassandra may iterate 
the whole memtable .
the problem is in ColumnFamilyStore.getRangeSlice() method .
{code:title=ColumnFamilyStore.java|borderStyle=solid}
 public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds 
range, int maxResults, IFilter columnFilter)
throws ExecutionException, InterruptedException
{
...
DecoratedKey startWith = new DecoratedKey(range.left, null);
DecoratedKey stopAt = new DecoratedKey(range.right, null);

QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, 
superColumn, null), columnFilter);
int gcBefore = (int)(System.currentTimeMillis() / 1000) - 
metadata.getGcGraceSeconds();

ListRow rows;
ViewFragment view = markReferenced(startWith, stopAt);
try
{
CloseableIteratorRow iterator = 
RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, 
stopAt, filter, getComparator(), this);
rows = new ArrayListRow();

try
{
// pull rows out of the iterator
boolean first = true;
while (iterator.hasNext()) {color:red} 
// this iterator may iterate the whole memtable!!
 
{color} 
   {

}
}
  .
}
   .
return rows;
}

{code} 

{code:title=Memtable.java|borderStyle=solid}
{color:red} 
// Just only query one row ,but returned a sublist of columnFamiles   
{color}  
public IteratorMap.EntryDecoratedKey, ColumnFamily 
getEntryIterator(DecoratedKey startWith)
{
return columnFamilies.tailMap(startWith).entrySet().iterator();
}
{code} 



{code:title=RowIteratorFactory.java|borderStyle=solid}
 public IColumnIterator computeNext()
{
while (iter.hasNext())
{
Map.EntryDecoratedKey, ColumnFamily entry = iter.next();
IColumnIterator ici = 
filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator);
{color:red} 
// entry.getKey() will never bigger or equal to startKey, and then iterate the 
whole sublist of memtable 
{color} 
if (pred.apply(ici))  
return ici;
}
return endOfData();
{code} 







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra

2011-12-15 Thread MaHaiyang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MaHaiyang updated CASSANDRA-3638:
-

Description: 
RangeSliceVerbHandler may  just only query one row , but cassandra may iterate 
the whole memtable .
the problem is in ColumnFamilyStore.getRangeSlice() method .
{code:title=ColumnFamilyStore.java|borderStyle=solid}
 public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds 
range, int maxResults, IFilter columnFilter)
throws ExecutionException, InterruptedException
{
...
DecoratedKey startWith = new DecoratedKey(range.left, null);
DecoratedKey stopAt = new DecoratedKey(range.right, null);

QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, 
superColumn, null), columnFilter);
int gcBefore = (int)(System.currentTimeMillis() / 1000) - 
metadata.getGcGraceSeconds();

ListRow rows;
ViewFragment view = markReferenced(startWith, stopAt);
try
{
CloseableIteratorRow iterator = 
RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, 
stopAt, filter, getComparator(), this);
rows = new ArrayListRow();

try
{
// pull rows out of the iterator
boolean first = true;
while (iterator.hasNext()) {color:red} // this iterator may 
iterate the whole memtable!!{color} 
   {

}
}
  .
}
   .
return rows;
}

{code} 

{code:title=Memtable.java|borderStyle=solid}
{color:red} // Just only query one row ,but returned a sublist of columnFamiles 
  {color}  
public IteratorMap.EntryDecoratedKey, ColumnFamily 
getEntryIterator(DecoratedKey startWith)
{
return columnFamilies.tailMap(startWith).entrySet().iterator();
}
{code} 



{code:title=RowIteratorFactory.java|borderStyle=solid}
 public IColumnIterator computeNext()
{
while (iter.hasNext())
{
Map.EntryDecoratedKey, ColumnFamily entry = iter.next();
IColumnIterator ici = 
filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator);
{color:red} // entry.getKey() will never bigger or equal to startKey, and then 
iterate the whole sublist of memtable {color} 
if (pred.apply(ici))  
return ici;
}
return endOfData();
{code} 







  was:
RangeSliceVerbHandler may  just only query one row , but cassandra may iterate 
the whole memtable .
the problem is in ColumnFamilyStore.getRangeSlice() method .
{code:title=ColumnFamilyStore.java|borderStyle=solid}
 public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds 
range, int maxResults, IFilter columnFilter)
throws ExecutionException, InterruptedException
{
...
DecoratedKey startWith = new DecoratedKey(range.left, null);
DecoratedKey stopAt = new DecoratedKey(range.right, null);

QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, 
superColumn, null), columnFilter);
int gcBefore = (int)(System.currentTimeMillis() / 1000) - 
metadata.getGcGraceSeconds();

ListRow rows;
ViewFragment view = markReferenced(startWith, stopAt);
try
{
CloseableIteratorRow iterator = 
RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, 
stopAt, filter, getComparator(), this);
rows = new ArrayListRow();

try
{
// pull rows out of the iterator
boolean first = true;
while (iterator.hasNext()) {color:red} 
// this iterator may iterate the whole memtable!!
 
{color} 
   {

}
}
  .
}
   .
return rows;
}

{code} 

{code:title=Memtable.java|borderStyle=solid}
{color:red} 
// Just only query one row ,but returned a sublist of columnFamiles   
{color}  
public IteratorMap.EntryDecoratedKey, ColumnFamily 
getEntryIterator(DecoratedKey startWith)
{
return columnFamilies.tailMap(startWith).entrySet().iterator();
}
{code} 



{code:title=RowIteratorFactory.java|borderStyle=solid}
 public IColumnIterator computeNext()
{
while (iter.hasNext())
{
Map.EntryDecoratedKey, ColumnFamily entry = iter.next();
IColumnIterator ici = 
filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator);
{color:red} 
// entry.getKey() will never bigger or equal to startKey, and then iterate the 
whole sublist of memtable 
{color} 
if (pred.apply(ici))  
return ici;
}
return 

[jira] [Updated] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra

2011-12-15 Thread MaHaiyang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MaHaiyang updated CASSANDRA-3638:
-

Description: 
RangeSliceVerbHandler may  just only query one row , but cassandra may iterate 
the whole memtable .
the problem is in ColumnFamilyStore.getRangeSlice() method .


{color:red} // this iterator may iterate the whole memtable!!{color}
{code:title=ColumnFamilyStore.java|borderStyle=solid}
 public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds 
range, int maxResults, IFilter columnFilter)
throws ExecutionException, InterruptedException
{
...
DecoratedKey startWith = new DecoratedKey(range.left, null);
DecoratedKey stopAt = new DecoratedKey(range.right, null);

QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, 
superColumn, null), columnFilter);
int gcBefore = (int)(System.currentTimeMillis() / 1000) - 
metadata.getGcGraceSeconds();

ListRow rows;
ViewFragment view = markReferenced(startWith, stopAt);
try
{
CloseableIteratorRow iterator = 
RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, 
stopAt, filter, getComparator(), this);
rows = new ArrayListRow();

try
{
// pull rows out of the iterator
boolean first = true;
while (iterator.hasNext()) // this iterator may iterate the 
whole memtable!!   
   {

}
}
  .
}
   .
return rows;
}

{code} 

{color:red} // Just only query one row ,but returned a sublist of columnFamiles 
  {color}
{code:title=Memtable.java|borderStyle=solid}
// Just only query one row ,but returned a sublist of columnFamiles 
public IteratorMap.EntryDecoratedKey, ColumnFamily 
getEntryIterator(DecoratedKey startWith)
{
return columnFamilies.tailMap(startWith).entrySet().iterator();
}
{code} 


{color:red} // entry.getKey() will never bigger or equal to startKey, and then 
iterate the whole sublist of memtable {color} 
{code:title=RowIteratorFactory.java|borderStyle=solid}
 public IColumnIterator computeNext()
{
while (iter.hasNext())
{
Map.EntryDecoratedKey, ColumnFamily entry = iter.next();
IColumnIterator ici = 
filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator);
// entry.getKey() will never bigger or equal to startKey, and 
then iterate the whole sublist of memtable 
if (pred.apply(ici))  
return ici;
}
return endOfData();
{code} 

  was:
RangeSliceVerbHandler may  just only query one row , but cassandra may iterate 
the whole memtable .
the problem is in ColumnFamilyStore.getRangeSlice() method .


{color:red} // this iterator may iterate the whole memtable!!{color}
{code:title=ColumnFamilyStore.java|borderStyle=solid}
 public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds 
range, int maxResults, IFilter columnFilter)
throws ExecutionException, InterruptedException
{
...
DecoratedKey startWith = new DecoratedKey(range.left, null);
DecoratedKey stopAt = new DecoratedKey(range.right, null);

QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, 
superColumn, null), columnFilter);
int gcBefore = (int)(System.currentTimeMillis() / 1000) - 
metadata.getGcGraceSeconds();

ListRow rows;
ViewFragment view = markReferenced(startWith, stopAt);
try
{
CloseableIteratorRow iterator = 
RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, 
stopAt, filter, getComparator(), this);
rows = new ArrayListRow();

try
{
// pull rows out of the iterator
boolean first = true;
while (iterator.hasNext()) // this iterator may iterate the 
whole memtable!!   
   {

}
}
  .
}
   .
return rows;
}

{code} 

{color:red} // Just only query one row ,but returned a sublist of columnFamiles 
  {color}
{code:title=Memtable.java|borderStyle=solid}
// Just only query one row ,but returned a sublist of columnFamiles 
public IteratorMap.EntryDecoratedKey, ColumnFamily 
getEntryIterator(DecoratedKey startWith)
{
return columnFamilies.tailMap(startWith).entrySet().iterator();
}
{code} 


{color:red} // entry.getKey() will never bigger or equal to startKey, and then 
iterate the whole sublist of memtable {color} 
{code:title=RowIteratorFactory.java|borderStyle=solid}
 public IColumnIterator 

[jira] [Updated] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra

2011-12-15 Thread MaHaiyang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MaHaiyang updated CASSANDRA-3638:
-

Description: 
RangeSliceVerbHandler may  just only query one row , but cassandra may iterate 
the whole memtable .
the problem is in ColumnFamilyStore.getRangeSlice() method .


{color:red} // this iterator may iterate the whole memtable!!{color}
{code:title=ColumnFamilyStore.java|borderStyle=solid}
 public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds 
range, int maxResults, IFilter columnFilter)
throws ExecutionException, InterruptedException
{
...
DecoratedKey startWith = new DecoratedKey(range.left, null);
DecoratedKey stopAt = new DecoratedKey(range.right, null);

QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, 
superColumn, null), columnFilter);
int gcBefore = (int)(System.currentTimeMillis() / 1000) - 
metadata.getGcGraceSeconds();

ListRow rows;
ViewFragment view = markReferenced(startWith, stopAt);
try
{
CloseableIteratorRow iterator = 
RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, 
stopAt, filter, getComparator(), this);
rows = new ArrayListRow();

try
{
// pull rows out of the iterator
boolean first = true;
while (iterator.hasNext()) // this iterator may iterate the 
whole memtable!!   
   {

}
}
  .
}
   .
return rows;
}

{code} 

{color:red} // Just only query one row ,but returned a sublist of columnFamiles 
  {color}
{code:title=Memtable.java|borderStyle=solid}
// Just only query one row ,but returned a sublist of columnFamiles 
public IteratorMap.EntryDecoratedKey, ColumnFamily 
getEntryIterator(DecoratedKey startWith)
{
return columnFamilies.tailMap(startWith).entrySet().iterator();
}
{code} 


{color:red} // entry.getKey() will never bigger or equal to startKey, and then 
iterate the whole sublist of memtable {color} 
{code:title=RowIteratorFactory.java|borderStyle=solid}
 public IColumnIterator computeNext()
{
while (iter.hasNext())
{
Map.EntryDecoratedKey, ColumnFamily entry = iter.next();
IColumnIterator ici = 
filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), comparator);
// entry.getKey() will never bigger or equal to startKey, and 
then iterate the whole sublist of memtable 
if (pred.apply(ici))  
return ici;
}
return endOfData();
{code} 







  was:
RangeSliceVerbHandler may  just only query one row , but cassandra may iterate 
the whole memtable .
the problem is in ColumnFamilyStore.getRangeSlice() method .
{code:title=ColumnFamilyStore.java|borderStyle=solid}
 public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds 
range, int maxResults, IFilter columnFilter)
throws ExecutionException, InterruptedException
{
...
DecoratedKey startWith = new DecoratedKey(range.left, null);
DecoratedKey stopAt = new DecoratedKey(range.right, null);

QueryFilter filter = new QueryFilter(null, new QueryPath(columnFamily, 
superColumn, null), columnFilter);
int gcBefore = (int)(System.currentTimeMillis() / 1000) - 
metadata.getGcGraceSeconds();

ListRow rows;
ViewFragment view = markReferenced(startWith, stopAt);
try
{
CloseableIteratorRow iterator = 
RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, 
stopAt, filter, getComparator(), this);
rows = new ArrayListRow();

try
{
// pull rows out of the iterator
boolean first = true;
while (iterator.hasNext()) {color:red} // this iterator may 
iterate the whole memtable!!{color} 
   {

}
}
  .
}
   .
return rows;
}

{code} 

{code:title=Memtable.java|borderStyle=solid}
{color:red} // Just only query one row ,but returned a sublist of columnFamiles 
  {color}  
public IteratorMap.EntryDecoratedKey, ColumnFamily 
getEntryIterator(DecoratedKey startWith)
{
return columnFamilies.tailMap(startWith).entrySet().iterator();
}
{code} 



{code:title=RowIteratorFactory.java|borderStyle=solid}
 public IColumnIterator computeNext()
{
while (iter.hasNext())
{
Map.EntryDecoratedKey, ColumnFamily entry = iter.next();
IColumnIterator ici = 
filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), 

[jira] [Updated] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs

2011-12-15 Thread Dominic Williams (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Williams updated CASSANDRA-3620:


Description: 
Proposal for an improved system for handling distributed deletes, which removes 
the requirement to regularly run repair processes to maintain performance and 
data integrity. 

h2. The Problem

There are various issues with repair:

* Repair is expensive to run
* Repair jobs are often made more expensive than they should be by other issues 
(nodes dropping requests, hinted handoff not working, downtime etc)
* Repair processes can often fail and need restarting, for example in cloud 
environments where network issues make a node disappear from the ring for a 
brief moment
* When you fail to run repair within GCSeconds, either by error or because of 
issues with Cassandra, data written to a node that did not see a later delete 
can reappear (and a node might miss a delete for several reasons including 
being down or simply dropping requests during load shedding)
* If you cannot run repair and have to increase GCSeconds to prevent deleted 
data reappearing, in some cases the growing tombstone overhead can 
significantly degrade performance

Because of the foregoing, in high throughput environments it can be very 
difficult to make repair a cron job. It can be preferable to keep a terminal 
open and run repair jobs one by one, making sure they succeed and keeping and 
eye on overall load to reduce system impact. This isn't desirable, and problems 
are exacerbated when there are lots of column families in a database or it is 
necessary to run a column family with a low GCSeconds to reduce tombstone load 
(because there are many write/deletes to that column family). The database 
owner must run repair within the GCSeconds window, or increase GCSeconds, to 
avoid potentially losing delete operations. 

It would be much better if there was no ongoing requirement to run repair to 
ensure deletes aren't lost, and no GCSeconds window. Ideally repair would be an 
optional maintenance utility used in special cases, or to ensure ONE reads get 
consistent data. 

h2. Reaper Model Proposal

# Tombstones do not expire, and there is no GCSeconds
# Tombstones have associated ACK lists, which record the replicas that have 
acknowledged them
# Tombstones are deleted (or marked for compaction) when they have been 
acknowledged by all replicas
# When a tombstone is deleted, it is added to a relic index. The relic index 
makes it possible for a reaper to acknowledge a tombstone after it is deleted
# The ACK lists and relic index are held in memory for speed
# Background reaper threads constantly stream ACK requests to other nodes, 
and stream back ACK responses back to requests they have received (throttling 
their usage of CPU and bandwidth so as not to affect performance)
# If a reaper receives a request to ACK a tombstone that does not exist, it 
creates the tombstone and adds an ACK for the requestor, and replies with an 
ACK. This is the worst that can happen, and does not cause data corruption. 

ADDENDUM

The proposal to hold the ACK and relic lists in memory was added after the 
first posting. Please see comments for full reasons. Furthermore, a proposal 
for enhancements to repair was posted to comments, which would cause tombstones 
to be scavenged when repair completes (the author had assumed this was the case 
anyway, but it seems at time of writing they are only scavenged during 
compaction on GCSeconds timeout). The proposals are not exclusive and this 
proposal is extended to include the possible enhancements to repair described.

NOTES

* If a node goes down for a prolonged period, the worst that can happen is that 
some tombstones are recreated across the cluster when it restarts, which does 
not corrupt data (and this will only occur with a very small number of 
tombstones)
* The system is simple to implement and predictable 
* With the reaper model, repair would become an optional process for optimizing 
the database to increase the consistency seen by ConsistencyLevel.ONE reads, 
and for fixing up nodes, for example after an sstable was lost

h3. Planned Benefits

* Reaper threads can utilize spare cycles to constantly scavenge tombstones 
in the background thereby greatly reducing tombstone load, improving query 
performance, reducing the system resources needed by processes such as 
compaction, and making performance generally more predictable 
* The reaper model means that GCSeconds is no longer necessary, which removes 
the threat of data corruption if repair can't be run successfully within that 
period (for example if repair can't be run because of a new adopter's lack of 
Cassandra expertise, a cron script failing, or Cassandra bugs or other 
technical issues)
* Reaper threads are fully automatic, work in the background and perform finely 
grained operations where 

[jira] [Issue Comment Edited] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs

2011-12-15 Thread Dominic Williams (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169761#comment-13169761
 ] 

Dominic Williams edited comment on CASSANDRA-3620 at 12/15/11 2:19 PM:
---

Ok I got it and +1 on that idea. I had actually assumed tombstones were 
compacted away after repair anyway. So abandon GCSeconds and simply kill of 
tombstones created before repair when it runs successfully (presumably on a 
range-by-range basis?)
* Improved performance through reduced tombstone load
* No risk of data corruption if repair not run

That would be a cool first step and improve the current situation. 

I think a reaper system is still needed though, although this feature would 
take some of the existing pressure off. There would still be the issue of 
tombstone build up between repairs, which means performance can vary (or 
actually, degrade) between invocations, the load spikes from repair itself and 
the manual nature of the process.

I guess I'm on the sharp end of this - we have several column families where 
columns represent game objects or messages owned by users where there is a high 
delete and insert load. Various operations need to perform slices of user rows 
and these can get much slower as tombstones build up, so GCSeconds has been 
brought right down, but this leads to the constant pain of omg how long left 
before need to run repair or increase GCSeconds etc.. improving repair as 
described would remove the Sword of Damocles threat of data corruption but we'd 
still need to make sure it was run regularly, performance would degrade between 
invocations and repair would create load spikes. The reaping model can take 
away those problems. 

  was (Author: dccwilliams):
Ok I got it and +1 on that idea. Abandon GCSeconds and simply kill of 
tombstones created before repair when it runs successfully (presumably on a 
range-by-range basis)
* Improved performance through reduced tombstone load
* No risk of data corruption if repair not run

That would be a very cool first step to optimize this

I think a reaper system would still be well worthwhile though, although this 
feature would take some pressure off. There is still the issue of tombstone 
build up between repairs, which means performance can vary (or actually, 
degrade) between invocations plus there are still the load spikes from repair 
itself

I guess I'm on the sharp end of this - we have several column families where 
columns represent game objects or messages owned by users where there is a high 
delete and insert load. Various operations need to perform slices of user rows 
and these can get much slower as tombstones build up, so GCSeconds has been 
brought right down, but this leads to the constant pain of omg how long left 
before need to run repair or increase GCSeconds etc.. improving repair would 
remove the Sword of Damocles thing but we'd still need to run it regularly and 
performance wouldn't be as consistent it could be with constant background 
reaping
  
 Proposal for distributed deletes - fully automatic Reaper Model rather than 
 GCSeconds and manual repairs
 --

 Key: CASSANDRA-3620
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dominic Williams
  Labels: GCSeconds,, deletes,, distributed_deletes,, 
 merkle_trees, repair,
   Original Estimate: 504h
  Remaining Estimate: 504h

 Proposal for an improved system for handling distributed deletes, which 
 removes the requirement to regularly run repair processes to maintain 
 performance and data integrity. 
 h2. The Problem
 There are various issues with repair:
 * Repair is expensive to run
 * Repair jobs are often made more expensive than they should be by other 
 issues (nodes dropping requests, hinted handoff not working, downtime etc)
 * Repair processes can often fail and need restarting, for example in cloud 
 environments where network issues make a node disappear from the ring for a 
 brief moment
 * When you fail to run repair within GCSeconds, either by error or because of 
 issues with Cassandra, data written to a node that did not see a later delete 
 can reappear (and a node might miss a delete for several reasons including 
 being down or simply dropping requests during load shedding)
 * If you cannot run repair and have to increase GCSeconds to prevent deleted 
 data reappearing, in some cases the growing tombstone overhead can 
 significantly degrade performance
 Because of the foregoing, in high throughput environments it can be very 
 difficult to make repair a cron job. It can be preferable to keep a terminal 
 

[jira] [Issue Comment Edited] (CASSANDRA-3620) Proposal for distributed deletes - fully automatic Reaper Model rather than GCSeconds and manual repairs

2011-12-15 Thread Dominic Williams (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169761#comment-13169761
 ] 

Dominic Williams edited comment on CASSANDRA-3620 at 12/15/11 2:22 PM:
---

Ok I got it and +1 on that idea. I had actually assumed tombstones were 
compacted away after repair anyway. So  as I understand GCSeconds would be 
removed, and tombstones would be marked for deletion once a repair operation 
was successfully run. 

That would be a cool first step and improve the current situation. 

But I think a reaper system is still needed: although this feature would take 
some of the current pressure off, there would still be the issue of tombstone 
build up between repairs, which means performance will degrade between 
invocations, the load spikes from repair itself and the manual nature of the 
process.

I guess I'm on the sharp end of this - we have several column families where 
columns represent game objects or messages owned by users where there is a high 
delete and insert load. Various operations need to perform slices of user rows 
and these can get much slower as tombstones build up, so GCSeconds has been 
brought right down, but this leads to the constant pain of omg how long left 
before need to run repair or increase GCSeconds etc.. improving repair as 
described would remove the Sword of Damocles threat of data corruption but we'd 
still need to make sure it was run regularly, performance would degrade between 
invocations and repair would create load spikes. The reaping model can take 
away those problems. 

  was (Author: dccwilliams):
Ok I got it and +1 on that idea. I had actually assumed tombstones were 
compacted away after repair anyway. So abandon GCSeconds and simply kill of 
tombstones created before repair when it runs successfully (presumably on a 
range-by-range basis?)
* Improved performance through reduced tombstone load
* No risk of data corruption if repair not run

That would be a cool first step and improve the current situation. 

I think a reaper system is still needed though, although this feature would 
take some of the existing pressure off. There would still be the issue of 
tombstone build up between repairs, which means performance can vary (or 
actually, degrade) between invocations, the load spikes from repair itself and 
the manual nature of the process.

I guess I'm on the sharp end of this - we have several column families where 
columns represent game objects or messages owned by users where there is a high 
delete and insert load. Various operations need to perform slices of user rows 
and these can get much slower as tombstones build up, so GCSeconds has been 
brought right down, but this leads to the constant pain of omg how long left 
before need to run repair or increase GCSeconds etc.. improving repair as 
described would remove the Sword of Damocles threat of data corruption but we'd 
still need to make sure it was run regularly, performance would degrade between 
invocations and repair would create load spikes. The reaping model can take 
away those problems. 
  
 Proposal for distributed deletes - fully automatic Reaper Model rather than 
 GCSeconds and manual repairs
 --

 Key: CASSANDRA-3620
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3620
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Dominic Williams
  Labels: GCSeconds,, deletes,, distributed_deletes,, 
 merkle_trees, repair,
   Original Estimate: 504h
  Remaining Estimate: 504h

 Proposal for an improved system for handling distributed deletes, which 
 removes the requirement to regularly run repair processes to maintain 
 performance and data integrity. 
 h2. The Problem
 There are various issues with repair:
 * Repair is expensive to run
 * Repair jobs are often made more expensive than they should be by other 
 issues (nodes dropping requests, hinted handoff not working, downtime etc)
 * Repair processes can often fail and need restarting, for example in cloud 
 environments where network issues make a node disappear from the ring for a 
 brief moment
 * When you fail to run repair within GCSeconds, either by error or because of 
 issues with Cassandra, data written to a node that did not see a later delete 
 can reappear (and a node might miss a delete for several reasons including 
 being down or simply dropping requests during load shedding)
 * If you cannot run repair and have to increase GCSeconds to prevent deleted 
 data reappearing, in some cases the growing tombstone overhead can 
 significantly degrade performance
 Because of the foregoing, in high throughput environments it can be 

[jira] [Commented] (CASSANDRA-2475) Prepared statements

2011-12-15 Thread Eric Evans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170257#comment-13170257
 ] 

Eric Evans commented on CASSANDRA-2475:
---

bq. WFM. I assumed Rick had already implemented one for JDBC API completeness 
but if we're just going to no-op that out for now I'm not going to lose any 
sleep over it.

He did, but we removed it at an earlier stage of the review, for the reasons 
listed here (so if it's decided that we should have one, I'll do the work to 
put it back in).

bq. It's the client's responsibility to prepare the statements on each 
connection before using them, which implies some caching behavior on the part 
of the driver as in 
http://www.theserverside.com/news/1365244/Why-Prepared-Statements-are-important-and-how-to-use-them-properly

OK, that makes sense.  Though, it would seem to add another data-point to the 
API to remove PSes isn't necessary argument, since a close() on a pooled a 
connection isn't going to remove the statement server-side anyway.

 Prepared statements
 ---

 Key: CASSANDRA-2475
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2475
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Affects Versions: 1.0.5
Reporter: Eric Evans
Assignee: Rick Shaw
Priority: Minor
  Labels: cql
 Fix For: 1.1

 Attachments: 2475-v1.patch, 2475-v2.patch, 2475-v3.1.patch, 
 2475-v3.2-Thrift.patch, v1-0001-CASSANDRA-2475-prepared-statement-patch.txt, 
 v1-0002-regenerated-thrift-java.txt, 
 v2-0001-CASSANDRA-2475-rickshaw-2475-v3.1.patch.txt, 
 v2-0002-rickshaw-2475-v3.2-Thrift.patch-w-changes.txt, 
 v2-0003-eevans-increment-thrift-version-by-1-not-3.txt, 
 v2-0004-eevans-misc-cleanups.txt, 
 v2-0005-eevans-refactor-for-better-encapsulation-of-prepare.txt, 
 v2-0006-eevans-log-queries-at-TRACE.txt, 
 v2-0007-use-an-LRU-map-for-storage-of-prepared-statements.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1214803 - /cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java

2011-12-15 Thread eevans
Author: eevans
Date: Thu Dec 15 15:05:08 2011
New Revision: 1214803

URL: http://svn.apache.org/viewvc?rev=1214803view=rev
Log:
bump maximum cached prepared statements to 10,000 (from 50)

(and fix Map so that it is actually LRU)

Patch by evans for CASSANDRA-2475

Modified:
cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java

Modified: cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java?rev=1214803r1=1214802r2=1214803view=diff
==
--- cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java 
(original)
+++ cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java Thu 
Dec 15 15:05:08 2011
@@ -43,7 +43,7 @@ import org.apache.cassandra.thrift.Inval
  */
 public class ClientState
 {
-private static final int MAX_CACHE_PREPARED = 50;   // Ridiculously large, 
right?
+private static final int MAX_CACHE_PREPARED = 1;// Enough to keep 
buggy clients from OOM'ing us
 private static Logger logger = LoggerFactory.getLogger(ClientState.class);
 
 // Current user for the session
@@ -53,7 +53,7 @@ public class ClientState
 private final ListObject resource = new ArrayListObject();
 
 // An LRU map of prepared statements
-private MapInteger, CQLStatement prepared = new HashMapInteger, 
CQLStatement() {
+private MapInteger, CQLStatement prepared = new LinkedHashMapInteger, 
CQLStatement(16, 0.75f, true) {
 protected boolean removeEldestEntry(Map.EntryInteger, CQLStatement 
eldest) {
 return size()  MAX_CACHE_PREPARED;
 }




[Cassandra Wiki] Update of ArticlesAndPresentations by zznate

2011-12-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ArticlesAndPresentations page has been changed by zznate:
http://wiki.apache.org/cassandra/ArticlesAndPresentations?action=diffrev1=129rev2=130

   * [[http://www.emtg.net78.net/2011/10/21/cassandra_hector.html|Cassandra y 
Hector]], Spanish, October 2011
  
  = Presentations =
+  * 
[[http://www.slideshare.net/jsevellec/cassandra-pour-les-dveloppeurs-java|Cassandra
 pourles(ch'tis)Développeurs Java]] - Jérémy Sevellec, December 2011
   * [[http://www.slideshare.net/mattdennis/cassandra-data-modeling|Cassandra 
Data Modeling Workshop]] - Cassandra SF, Matthew F. Dennis, July 2011
   * 
[[http://www.slideshare.net/jeromatron/cassandrahadoop-integration|Cassandra/Hadoop
 Integration]] - Jeremy Hanna, January 2011
   * 
[[http://www.slideshare.net/supertom/using-cassandra-with-your-web-application|Using
 Cassandra with your Web Application]] - Tom Melendez, Oct 2010


[Cassandra Wiki] Update of ArticlesAndPresentations by zznate

2011-12-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ArticlesAndPresentations page has been changed by zznate:
http://wiki.apache.org/cassandra/ArticlesAndPresentations?action=diffrev1=130rev2=131

Comment:
adjust link name format

   * [[http://www.emtg.net78.net/2011/10/21/cassandra_hector.html|Cassandra y 
Hector]], Spanish, October 2011
  
  = Presentations =
-  * 
[[http://www.slideshare.net/jsevellec/cassandra-pour-les-dveloppeurs-java|Cassandra
 pourles(ch'tis)Développeurs Java]] - Jérémy Sevellec, December 2011
+  * 
[[http://www.slideshare.net/jsevellec/cassandra-pour-les-dveloppeurs-java|Cassandra
 pour les (ch'tis) Développeurs Java]] - Jérémy Sevellec, December 2011
   * [[http://www.slideshare.net/mattdennis/cassandra-data-modeling|Cassandra 
Data Modeling Workshop]] - Cassandra SF, Matthew F. Dennis, July 2011
   * 
[[http://www.slideshare.net/jeromatron/cassandrahadoop-integration|Cassandra/Hadoop
 Integration]] - Jeremy Hanna, January 2011
   * 
[[http://www.slideshare.net/supertom/using-cassandra-with-your-web-application|Using
 Cassandra with your Web Application]] - Tom Melendez, Oct 2010


[jira] [Updated] (CASSANDRA-3639) Move streams too many data

2011-12-15 Thread Fabien Rousseau (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabien Rousseau updated CASSANDRA-3639:
---

Attachment: 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch

 Move streams too many data
 --

 Key: CASSANDRA-3639
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3639
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.7
Reporter: Fabien Rousseau
Priority: Minor
 Attachments: 
 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch


 During a move operation, we observed that the node streamed most of its data 
 and received all its data.
 We are running Cassandra 0.8.7 (plus a few patches)
 After reading the code related to move, we found out that :
  - in StorageService.java, line 2002 and line 2004 = ranges are returned in 
 a non ordered collection, but calculateStreamAndFetchRanges() method (line 
 2011) assume ranges are sorted, thus, resulting in wider ranges to be 
 fetched/streamed
 We managed to isolate and reproduce this in a unit test.
 We also propose a patch which :
  - does not rely on any sort
  - adds a few unit tests (may not be exhaustive...)
 Unit tests are done only for RF=2 and for the OldNetworkStrategyTopology. 
 For the sake of simplicity, we've put them in OldNetworkStrategyTopologyTest, 
 but they probably should be moved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-3637) data file size limit

2011-12-15 Thread Jonathan Ellis (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-3637.
---

Resolution: Not A Problem

LeveledCompactionStrategy addresses this. 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

 data file size limit
 

 Key: CASSANDRA-3637
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3637
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Zenek Kraweznik

 For 100GB cassandra database (on 500GB disk) I need another 100GB space for 
 compacting (caused by large files, one of data file is 80GB).
 Limitng file size for ex to 5GB (limit shoud be configurable) I need 
 significantly less space for that operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3639) Move streams too many data

2011-12-15 Thread Fabien Rousseau (Created) (JIRA)
Move streams too many data
--

 Key: CASSANDRA-3639
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3639
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.7
Reporter: Fabien Rousseau
Priority: Minor
 Attachments: 
0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch

During a move operation, we observed that the node streamed most of its data 
and received all its data.

We are running Cassandra 0.8.7 (plus a few patches)

After reading the code related to move, we found out that :
 - in StorageService.java, line 2002 and line 2004 = ranges are returned in a 
non ordered collection, but calculateStreamAndFetchRanges() method (line 2011) 
assume ranges are sorted, thus, resulting in wider ranges to be fetched/streamed

We managed to isolate and reproduce this in a unit test.
We also propose a patch which :
 - does not rely on any sort
 - adds a few unit tests (may not be exhaustive...)

Unit tests are done only for RF=2 and for the OldNetworkStrategyTopology. For 
the sake of simplicity, we've put them in OldNetworkStrategyTopologyTest, but 
they probably should be moved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3638) It may iterate the whole memtable while just query one row . This seriously affect the performance . of Cassandra

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170304#comment-13170304
 ] 

Jonathan Ellis commented on CASSANDRA-3638:
---

getRangeSlice is the scan a lot of rows method.  getColumnFamily is the scan 
a single row method.

 It may iterate the whole memtable while just query one row . This seriously 
 affect the  performance . of Cassandra
 --

 Key: CASSANDRA-3638
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3638
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: MaHaiyang

 RangeSliceVerbHandler may  just only query one row , but cassandra may 
 iterate the whole memtable .
 the problem is in ColumnFamilyStore.getRangeSlice() method .
 {color:red} // this iterator may iterate the whole memtable!!{color}
 {code:title=ColumnFamilyStore.java|borderStyle=solid}
  public ListRow getRangeSlice(ByteBuffer superColumn, final AbstractBounds 
 range, int maxResults, IFilter columnFilter)
 throws ExecutionException, InterruptedException
 {
 ...
 DecoratedKey startWith = new DecoratedKey(range.left, null);
 DecoratedKey stopAt = new DecoratedKey(range.right, null);
 QueryFilter filter = new QueryFilter(null, new 
 QueryPath(columnFamily, superColumn, null), columnFilter);
 int gcBefore = (int)(System.currentTimeMillis() / 1000) - 
 metadata.getGcGraceSeconds();
 ListRow rows;
 ViewFragment view = markReferenced(startWith, stopAt);
 try
 {
 CloseableIteratorRow iterator = 
 RowIteratorFactory.getIterator(view.memtables, view.sstables, startWith, 
 stopAt, filter, getComparator(), this);
 rows = new ArrayListRow();
 try
 {
 // pull rows out of the iterator
 boolean first = true;
 while (iterator.hasNext()) // this iterator may iterate the 
 whole memtable!!   
{
 
 }
 }
   .
 }
.
 return rows;
 }
 {code} 
 {color:red} // Just only query one row ,but returned a sublist of 
 columnFamiles   {color}
 {code:title=Memtable.java|borderStyle=solid}
 // Just only query one row ,but returned a sublist of columnFamiles 
 public IteratorMap.EntryDecoratedKey, ColumnFamily 
 getEntryIterator(DecoratedKey startWith)
 {
 return columnFamilies.tailMap(startWith).entrySet().iterator();
 }
 {code} 
 {color:red} // entry.getKey() will never bigger or equal to startKey, and 
 then iterate the whole sublist of memtable {color} 
 {code:title=RowIteratorFactory.java|borderStyle=solid}
  public IColumnIterator computeNext()
 {
 while (iter.hasNext())
 {
 Map.EntryDecoratedKey, ColumnFamily entry = iter.next();
 IColumnIterator ici = 
 filter.getMemtableColumnIterator(entry.getValue(), entry.getKey(), 
 comparator);
 // entry.getKey() will never bigger or equal to startKey, and 
 then iterate the whole sublist of memtable 
 if (pred.apply(ici))  
 return ici;
 }
 return endOfData();
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170310#comment-13170310
 ] 

Jonathan Ellis commented on CASSANDRA-3635:
---

I don't think we should put this in 0.8.  The repair problems there are a lot 
deeper than this.  I'm fine with posting a backport patch if people want to run 
a custom build with it, but this shouldn't go in anything earlier than 1.0.  
(I'd prefer 1.1 TBH.)

Since compaction throughput does not include validation anymore, I'd prefer to 
default to something like 12/4 instead of effectively increasing the impact of 
compaction + repair out of the box.

 Throttle validation separately from other compaction
 

 Key: CASSANDRA-3635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 0.8.10, 1.0.7

 Attachments: 0001-separate-validation-throttling.patch


 Validation compaction is fairly ressource intensive. It is possible to 
 throttle it with other compaction, but there is cases where you really want 
 to throttle it rather aggressively but don't necessarily want to have minor 
 compactions throttled that much. The goal is to (optionally) allow to set a 
 separate throttling value for validation.
 PS: I'm not pretending this will solve every repair problem or anything. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3639) Move streams too many data

2011-12-15 Thread Jonathan Ellis (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3639:
--

 Reviewer: thepaul
Fix Version/s: 1.1

I'm not comfortable changing the move code in a stable release, but this sounds 
like a good change to make in 1.1.  Can you post a version of the patch that 
applies to trunk?

 Move streams too many data
 --

 Key: CASSANDRA-3639
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3639
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.7
Reporter: Fabien Rousseau
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch


 During a move operation, we observed that the node streamed most of its data 
 and received all its data.
 We are running Cassandra 0.8.7 (plus a few patches)
 After reading the code related to move, we found out that :
  - in StorageService.java, line 2002 and line 2004 = ranges are returned in 
 a non ordered collection, but calculateStreamAndFetchRanges() method (line 
 2011) assume ranges are sorted, thus, resulting in wider ranges to be 
 fetched/streamed
 We managed to isolate and reproduce this in a unit test.
 We also propose a patch which :
  - does not rely on any sort
  - adds a few unit tests (may not be exhaustive...)
 Unit tests are done only for RF=2 and for the OldNetworkStrategyTopology. 
 For the sake of simplicity, we've put them in OldNetworkStrategyTopologyTest, 
 but they probably should be moved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2475) Prepared statements

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170320#comment-13170320
 ] 

Jonathan Ellis commented on CASSANDRA-2475:
---

bq. it would seem to add another data-point to the API to remove PSes isn't 
necessary argument

Agreed, let's leave that out for now.

 Prepared statements
 ---

 Key: CASSANDRA-2475
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2475
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Affects Versions: 1.0.5
Reporter: Eric Evans
Assignee: Rick Shaw
Priority: Minor
  Labels: cql
 Fix For: 1.1

 Attachments: 2475-v1.patch, 2475-v2.patch, 2475-v3.1.patch, 
 2475-v3.2-Thrift.patch, v1-0001-CASSANDRA-2475-prepared-statement-patch.txt, 
 v1-0002-regenerated-thrift-java.txt, 
 v2-0001-CASSANDRA-2475-rickshaw-2475-v3.1.patch.txt, 
 v2-0002-rickshaw-2475-v3.2-Thrift.patch-w-changes.txt, 
 v2-0003-eevans-increment-thrift-version-by-1-not-3.txt, 
 v2-0004-eevans-misc-cleanups.txt, 
 v2-0005-eevans-refactor-for-better-encapsulation-of-prepare.txt, 
 v2-0006-eevans-log-queries-at-TRACE.txt, 
 v2-0007-use-an-LRU-map-for-storage-of-prepared-statements.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170322#comment-13170322
 ] 

Jonathan Ellis commented on CASSANDRA-1391:
---

That doesn't work, though: what if we have two updates at the same timestamp?  
I think it really does need to be content-based.

Also, I still think using Table.apply and CF.diff is the right way to do 
this, instead of effectively duplicating that code as a special case.  Are 
there any downsides to that approach I'm missing?

 Allow Concurrent Schema Migrations
 --

 Key: CASSANDRA-1391
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Pavel Yaskevich
 Fix For: 1.1

 Attachments: 
 0001-new-migration-schema-and-avro-methods-cleanup.patch, 
 0002-avro-removal.patch, 0003-oldVersion-removed-nit-fixed.patch, 
 CASSANDRA-1391.patch


 CASSANDRA-1292 fixed multiple migrations started from the same node to 
 properly queue themselves, but it is still possible for migrations initiated 
 on different nodes to conflict and leave the cluster in a bad state. Since 
 the system_add/drop/rename methods are accessible directly from the client 
 API, they should be completely safe for concurrent use.
 It should be possible to allow for most types of concurrent migrations by 
 converting the UUID schema ID into a VersionVectorClock (as provided by 
 CASSANDRA-580).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170323#comment-13170323
 ] 

Jonathan Ellis commented on CASSANDRA-3616:
---

Eric, do you also see correlation w/ repair operations?

 Temp SSTable and file descriptor leak
 -

 Key: CASSANDRA-3616
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.5
 Environment: 1.0.5 + CASSANDRA-3532 patch
 Solaris 10
Reporter: Eric Parusel

 Discussion about this started in CASSANDRA-3532.  It's on it's own ticket now.
 Anyhow:
 The nodes in my cluster are using a lot of file descriptors, holding open tmp 
 files. A few are using 50K+, nearing their limit (on Solaris, of 64K).
 Here's a small snippet of lsof:
 java 828 appdeployer *162u VREG 181,65540 0 333884 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db
 java 828 appdeployer *163u VREG 181,65540 0 333502 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db
 java 828 appdeployer *165u VREG 181,65540 0 333929 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db
 java 828 appdeployer *166u VREG 181,65540 0 333859 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db
 java 828 appdeployer *167u VREG 181,65540 0 333663 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db
 java 828 appdeployer *168u VREG 181,65540 0 333812 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 I spot checked a few and found they still exist on the filesystem too:
 rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 After more investigation, it seems to happen during a CompactionTask.
 I waited until I saw some -tmp- files hanging around in the data dir:
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Data.db
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Index.db
 and then found this in the logs:
  INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java 
 (line 113) Compacting 
 [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')]
 INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java 
 (line 218) Compacted to 
 [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,].
   83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 
 24.332518MB/s.  Time: 3,288ms.
 Note that the timestamp of the 2nd log line matches the last modified time of 
 the files, and has IDs leading up to, *but not including 788904*.
 I thought this might be relavent information, but I haven't found the 
 specific cause yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak

2011-12-15 Thread Eric Parusel (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170333#comment-13170333
 ] 

Eric Parusel commented on CASSANDRA-3616:
-

I haven't run a repair lately (no deletes at this time, I plan on setting up 
scheduled repairs though), but can run one to find out in the next few hours.

So far I've only correlated it with compaction.
I should note we're using LeveledCompactionStrategy.

 Temp SSTable and file descriptor leak
 -

 Key: CASSANDRA-3616
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.5
 Environment: 1.0.5 + CASSANDRA-3532 patch
 Solaris 10
Reporter: Eric Parusel

 Discussion about this started in CASSANDRA-3532.  It's on it's own ticket now.
 Anyhow:
 The nodes in my cluster are using a lot of file descriptors, holding open tmp 
 files. A few are using 50K+, nearing their limit (on Solaris, of 64K).
 Here's a small snippet of lsof:
 java 828 appdeployer *162u VREG 181,65540 0 333884 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db
 java 828 appdeployer *163u VREG 181,65540 0 333502 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db
 java 828 appdeployer *165u VREG 181,65540 0 333929 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db
 java 828 appdeployer *166u VREG 181,65540 0 333859 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db
 java 828 appdeployer *167u VREG 181,65540 0 333663 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db
 java 828 appdeployer *168u VREG 181,65540 0 333812 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 I spot checked a few and found they still exist on the filesystem too:
 rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 After more investigation, it seems to happen during a CompactionTask.
 I waited until I saw some -tmp- files hanging around in the data dir:
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Data.db
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Index.db
 and then found this in the logs:
  INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java 
 (line 113) Compacting 
 [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')]
 INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java 
 (line 218) Compacted to 
 [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,].
   83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 
 24.332518MB/s.  Time: 3,288ms.
 Note that the timestamp of the 2nd log line matches the last modified time of 
 the files, and has IDs leading up to, *but not including 788904*.
 I thought this might be relavent information, but I haven't found the 
 specific cause yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1391) Allow Concurrent Schema Migrations

2011-12-15 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170336#comment-13170336
 ] 

Pavel Yaskevich commented on CASSANDRA-1391:


We could compare uuids instead in the isMergingMigration method.

How do node determine if it is ahead or behind of the ring with content based 
versioning? Even if able to determine state, how do you find out what 
migrations node needs to send/receive to get ring in sync?

 Allow Concurrent Schema Migrations
 --

 Key: CASSANDRA-1391
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Stu Hood
Assignee: Pavel Yaskevich
 Fix For: 1.1

 Attachments: 
 0001-new-migration-schema-and-avro-methods-cleanup.patch, 
 0002-avro-removal.patch, 0003-oldVersion-removed-nit-fixed.patch, 
 CASSANDRA-1391.patch


 CASSANDRA-1292 fixed multiple migrations started from the same node to 
 properly queue themselves, but it is still possible for migrations initiated 
 on different nodes to conflict and leave the cluster in a bad state. Since 
 the system_add/drop/rename methods are accessible directly from the client 
 API, they should be completely safe for concurrent use.
 It should be possible to allow for most types of concurrent migrations by 
 converting the UUID schema ID into a VersionVectorClock (as provided by 
 CASSANDRA-580).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3143:
---

Attachment: (was: 
0003-CacheServiceMBean-and-correct-key-cache-loading.patch)

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3143:
---

Attachment: (was: 
0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch)

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3143:
---

Attachment: (was: 0001-global-key-cache.patch)

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3143:
---

Attachment: (was: 
0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch)

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3143:
---

Attachment: (was: 0004-key-row-cache-tests-and-tweaks.patch)

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3143:
---

Attachment: (was: 
0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch)

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3143:
---

Attachment: 0007-second-round-of-changes-according-to-Sylvain-comment.patch
0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch
0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch
0004-key-row-cache-tests-and-tweaks.patch
0003-CacheServiceMBean-and-correct-key-cache-loading.patch
0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch
0001-global-key-cache.patch

rebased set of patches, where all the changes from the second Sylvain's comment 
are in patch #7.

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2475) Prepared statements

2011-12-15 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170360#comment-13170360
 ] 

Hudson commented on CASSANDRA-2475:
---

Integrated in Cassandra #1257 (See 
[https://builds.apache.org/job/Cassandra/1257/])
bump maximum cached prepared statements to 10,000 (from 50)

(and fix Map so that it is actually LRU)

Patch by evans for CASSANDRA-2475

eevans : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1214803
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/service/ClientState.java


 Prepared statements
 ---

 Key: CASSANDRA-2475
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2475
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Affects Versions: 1.0.5
Reporter: Eric Evans
Assignee: Rick Shaw
Priority: Minor
  Labels: cql
 Fix For: 1.1

 Attachments: 2475-v1.patch, 2475-v2.patch, 2475-v3.1.patch, 
 2475-v3.2-Thrift.patch, v1-0001-CASSANDRA-2475-prepared-statement-patch.txt, 
 v1-0002-regenerated-thrift-java.txt, 
 v2-0001-CASSANDRA-2475-rickshaw-2475-v3.1.patch.txt, 
 v2-0002-rickshaw-2475-v3.2-Thrift.patch-w-changes.txt, 
 v2-0003-eevans-increment-thrift-version-by-1-not-3.txt, 
 v2-0004-eevans-misc-cleanups.txt, 
 v2-0005-eevans-refactor-for-better-encapsulation-of-prepare.txt, 
 v2-0006-eevans-log-queries-at-TRACE.txt, 
 v2-0007-use-an-LRU-map-for-storage-of-prepared-statements.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170364#comment-13170364
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

bq. we cannot stream between two nodes, one using separate cf directory

I don't see any reason to continue to support the old-style directory layout.  
That adds complexity (operationally as well as in the code) for no benefit that 
I can think of.  I think we should migrate from old layout to new on the first 
startup under 1.1.

bq. regarding keyspaces in file names, sure, why not, guess having a header 
with this info in the file is out of the question, then the only meta data we 
have is the file name, right? A problem could be if we want to do 
CASSANDRA-1983 later, that would increase the file name length even more

I'm on the fence here -- on the one hand having ks + cf in the filename 
simplifies some things.  On the other hand, we allow arbitrary-length KS + CF 
names (up to 64K iirc) so UUID aside we're already in trouble on ext3/ext4, 
xfs, and ntfs, which all support max filename length of ~256.  I'm starting to 
think we should move these into the metadata component instead of the filename.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170367#comment-13170367
 ] 

Jonathan Ellis commented on CASSANDRA-3635:
---

Taking a step back, I'm not sure I see the benefit here.  If we're okay with X 
MB/s of i/o going on, doesn't that disrupt reads just as much whether that 
comes from repair validation or ordinary compaction?  

 Throttle validation separately from other compaction
 

 Key: CASSANDRA-3635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 0.8.10, 1.0.7

 Attachments: 0001-separate-validation-throttling.patch


 Validation compaction is fairly ressource intensive. It is possible to 
 throttle it with other compaction, but there is cases where you really want 
 to throttle it rather aggressively but don't necessarily want to have minor 
 compactions throttled that much. The goal is to (optionally) allow to set a 
 separate throttling value for validation.
 PS: I'm not pretending this will solve every repair problem or anything. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3628) Make Pig/CassandraStorage delete functionality disabled by default and configurable

2011-12-15 Thread Jeremy Hanna (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-3628:


Attachment: 3628.txt

Split out the conditions so it can do a noop for null values.  Not 100% certain 
that's the desired behavior - do we want to do that or do we want to just write 
an empty value.  However, if we want to write an empty value, we have to modify 
the null to an empty value because of the NPEs that happen if we don't change 
it.

For our purposes, we want to skip them if the values are null.  In our code we 
also log the column family name and the column name, but that might be up to 
the user who wants to do that - adds a lot of logging.  Maybe people want that 
though.

 Make Pig/CassandraStorage delete functionality disabled by default and 
 configurable
 ---

 Key: CASSANDRA-3628
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3628
 Project: Cassandra
  Issue Type: Task
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
  Labels: pig
 Fix For: 1.0.7, 1.1

 Attachments: 3628.txt


 Right now, there is a way to delete column with the CassandraStorage 
 loadstorefunc.  In practice it is a bad idea to have that enabled by default. 
  A scenario: do an outer join and you don't have a value for something and 
 then you write out to cassandra all of the attributes of that relation.  
 You've just inadvertently deleted a column for all the rows that didn't have 
 that value as a result of the outer join.  It can be argued that you want to 
 be careful with how you project after the join.  However, I would think 
 disabling by default and having a configurable property to enable it for the 
 instances when you explicitly want to use it is the right plan.
 Fwiw, we had a bug in one of our scripts that did exactly as described above. 
  It's good to fix the bug.  It's bad to implicitly delete data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3625) Do something about DynamicCompositeType

2011-12-15 Thread Ed Anuff (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170383#comment-13170383
 ] 

Ed Anuff commented on CASSANDRA-3625:
-

I'm not sure we need a longer term solution than what I'm proposing.  I think 
we're all in agreement that throwing the exception the way it's doing now is 
bad and that a deterministic though not necessarily transparent sort behavior 
is the best solution.  Sylvain, are you working on this one or would you like 
me to take a stab at it?

 Do something about DynamicCompositeType
 ---

 Key: CASSANDRA-3625
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3625
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne

 Currently, DynamicCompositeType is a super dangerous type. We cannot leave it 
 that way or people will get hurt.
 Let's recall that DynamicCompositeType allows composite column names without 
 any limitation on what each component type can be. It was added to basically 
 allow to use different rows of the same column family to each store a 
 different index. So for instance you would have:
 {noformat}
 index1: {
   bar:24 - someval
   bar:42 - someval
   foo:12 - someval
   ...
 }
 index2: {
   0:uuid1:3.2 - someval
   1:uuid2:2.2 - someval
   ...
 }
 
 {noformat}
 where index1, index2, ... are rows.
 So each row have columns whose names have similar structure (so they can be 
 compared), but between rows the structure can be different (we neve compare 
 two columns from two different rows).
 But the problem is the following: what happens if in the index1 row above, 
 you insert a column whose name is 0:uuid1 ? There is no really meaningful way 
 to compare bar:24 and 0:uuid1. The current implementation of 
 DynamicCompositeType, when confronted with this, says that it is a user error 
 and throw a MarshalException.
 The problem with that is that the exception is not throw at insert time, and 
 it *cannot* be because of the dynamic nature of the comparator. But that 
 means that if you do insert the wrong column in the wrong row, you end up 
 *corrupting* a sstable.
 It is too dangerous a behavior. And it's probably made worst by the fact that 
 some people probably think that DynamicCompositeType should be superior to 
 CompositeType since you know, it's dynamic.
 One solution to that problem could be to decide of some random (but 
 predictable) order between two incomparable component. For example we could 
 design that IntType  LongType  StringType ...
 Note that even if we do that, I would suggest renaming the 
 DynamicCompositeType to something that suggest that CompositeType is always 
 preferable to DynamicCompositeType unless you're really doing very advanced 
 stuffs.
 Opinions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170385#comment-13170385
 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
-

{quote}
On the other hand, we allow arbitrary-length KS + CF names (up to 64K iirc) so 
UUID aside we're already in trouble on ext3/ext4, xfs, and ntfs, which all 
support max filename length of ~256. I'm starting to think we should move these 
into the metadata component instead of the filename.
{quote}

The thing with the metadata component is that from a code perspective, there is 
lots of places where we want to create a Descriptor, which involves extracting 
the keyspace/cf names only based on the filename. Adding the necessity to 
locate and read the metadata in those places will likely don't be very fun.

So I'd be in favor of just limiting the keyspace and column family names. It's 
one for which there is no real point to have very long names. Limiting each one 
to 32 characters shouldn't be a strong limitation.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3639) Move streams too many data

2011-12-15 Thread Fabien Rousseau (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabien Rousseau updated CASSANDRA-3639:
---

Attachment: 0001-try-to-fix-move-streaming-too-many-data-unit-tests-v2.patch

Sure.

This patch (v2) applies to trunk.
I also made a little modification in the tests (for the first patch, at the 
last moment, I changed the first token from 0 to 10, and this made most of the 
test OK with the current code)


 Move streams too many data
 --

 Key: CASSANDRA-3639
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3639
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.7
Reporter: Fabien Rousseau
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-try-to-fix-move-streaming-too-many-data-unit-tests-v2.patch, 
 0001-try-to-fix-move-streaming-too-many-data-unit-tests.patch


 During a move operation, we observed that the node streamed most of its data 
 and received all its data.
 We are running Cassandra 0.8.7 (plus a few patches)
 After reading the code related to move, we found out that :
  - in StorageService.java, line 2002 and line 2004 = ranges are returned in 
 a non ordered collection, but calculateStreamAndFetchRanges() method (line 
 2011) assume ranges are sorted, thus, resulting in wider ranges to be 
 fetched/streamed
 We managed to isolate and reproduce this in a unit test.
 We also propose a patch which :
  - does not rely on any sort
  - adds a few unit tests (may not be exhaustive...)
 Unit tests are done only for RF=2 and for the OldNetworkStrategyTopology. 
 For the sake of simplicity, we've put them in OldNetworkStrategyTopologyTest, 
 but they probably should be moved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak

2011-12-15 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170393#comment-13170393
 ] 

Brandon Williams commented on CASSANDRA-3616:
-

I can reproduce this with SizeTieredStrategy.

 Temp SSTable and file descriptor leak
 -

 Key: CASSANDRA-3616
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.5
 Environment: 1.0.5 + CASSANDRA-3532 patch
 Solaris 10
Reporter: Eric Parusel

 Discussion about this started in CASSANDRA-3532.  It's on it's own ticket now.
 Anyhow:
 The nodes in my cluster are using a lot of file descriptors, holding open tmp 
 files. A few are using 50K+, nearing their limit (on Solaris, of 64K).
 Here's a small snippet of lsof:
 java 828 appdeployer *162u VREG 181,65540 0 333884 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db
 java 828 appdeployer *163u VREG 181,65540 0 333502 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db
 java 828 appdeployer *165u VREG 181,65540 0 333929 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db
 java 828 appdeployer *166u VREG 181,65540 0 333859 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db
 java 828 appdeployer *167u VREG 181,65540 0 333663 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db
 java 828 appdeployer *168u VREG 181,65540 0 333812 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 I spot checked a few and found they still exist on the filesystem too:
 rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 After more investigation, it seems to happen during a CompactionTask.
 I waited until I saw some -tmp- files hanging around in the data dir:
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Data.db
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Index.db
 and then found this in the logs:
  INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java 
 (line 113) Compacting 
 [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')]
 INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java 
 (line 218) Compacted to 
 [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,].
   83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 
 24.332518MB/s.  Time: 3,288ms.
 Note that the timestamp of the 2nd log line matches the last modified time of 
 the files, and has IDs leading up to, *but not including 788904*.
 I thought this might be relavent information, but I haven't found the 
 specific cause yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170396#comment-13170396
 ] 

Jonathan Ellis commented on CASSANDRA-3616:
---

Just with compaction?  Did you get a debug log?

 Temp SSTable and file descriptor leak
 -

 Key: CASSANDRA-3616
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.5
 Environment: 1.0.5 + CASSANDRA-3532 patch
 Solaris 10
Reporter: Eric Parusel

 Discussion about this started in CASSANDRA-3532.  It's on it's own ticket now.
 Anyhow:
 The nodes in my cluster are using a lot of file descriptors, holding open tmp 
 files. A few are using 50K+, nearing their limit (on Solaris, of 64K).
 Here's a small snippet of lsof:
 java 828 appdeployer *162u VREG 181,65540 0 333884 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db
 java 828 appdeployer *163u VREG 181,65540 0 333502 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db
 java 828 appdeployer *165u VREG 181,65540 0 333929 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db
 java 828 appdeployer *166u VREG 181,65540 0 333859 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db
 java 828 appdeployer *167u VREG 181,65540 0 333663 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db
 java 828 appdeployer *168u VREG 181,65540 0 333812 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 I spot checked a few and found they still exist on the filesystem too:
 rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 After more investigation, it seems to happen during a CompactionTask.
 I waited until I saw some -tmp- files hanging around in the data dir:
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Data.db
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Index.db
 and then found this in the logs:
  INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java 
 (line 113) Compacting 
 [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')]
 INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java 
 (line 218) Compacted to 
 [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,].
   83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 
 24.332518MB/s.  Time: 3,288ms.
 Note that the timestamp of the 2nd log line matches the last modified time of 
 the files, and has IDs leading up to, *but not including 788904*.
 I thought this might be relavent information, but I haven't found the 
 specific cause yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction

2011-12-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170397#comment-13170397
 ] 

Sylvain Lebresne commented on CASSANDRA-3635:
-

I guess part of the idea is that validation is a bit cpu intensive (due to the 
SHA-256 hash it does), so that allows to limit that too without being a problem 
for other compaction. It also allows giving more room for ordinary compactions, 
so that they complete earlier, which will impact reads (while having validation 
finishing quickly is not necessarily as important).

 Throttle validation separately from other compaction
 

 Key: CASSANDRA-3635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 0.8.10, 1.0.7

 Attachments: 0001-separate-validation-throttling.patch


 Validation compaction is fairly ressource intensive. It is possible to 
 throttle it with other compaction, but there is cases where you really want 
 to throttle it rather aggressively but don't necessarily want to have minor 
 compactions throttled that much. The goal is to (optionally) allow to set a 
 separate throttling value for validation.
 PS: I'm not pretending this will solve every repair problem or anything. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170400#comment-13170400
 ] 

Jonathan Ellis commented on CASSANDRA-3143:
---

bq. I fail to see what is so crazy about having the function that saves the 
cache having access to both key and value. It may require a bit of refactoring, 
but I don't see that as a good argument. Anyway, it's not a very big deal but I 
still think that the two phase loading is more fragile than it needs, and 
saving values would allow a proper reload.

Why would you want to do a cache reload?  That's just going to be stale...  
Clearing the cache I can understand, but reloading a semi-arbitrary older cache 
state?  I don't see the value there.

ISTM we're talking about trading one kind of ugly code (passing around the Set 
of keys to load to SSTR) for another (a lot of code duplication between key 
cache, which wants to save values, and row cache, which doesn't).  It's also 
worth pointing out that if we're concerned about cache size, the two-phase 
approach gives smaller saved caches.  So I think I'd lean towards the existing, 
two-phase approach.

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3625) Do something about DynamicCompositeType

2011-12-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170402#comment-13170402
 ] 

Sylvain Lebresne commented on CASSANDRA-3625:
-

bq. Right, but I thought we were positing that You Shouldn't Do That. In which 
case as long as it doesn't crash, I'm good

Without going in the debate of how useful or not that is, I think that as soon 
as it is allowed (to mix in the same row column with components of different 
types), some will do it, so we'd rather choose the more coherent solution and 
so I also prefer fixing some order on the types themselves and use that.

As for picking the actual sort between types, I'd prefer avoiding a hash as 
it's not a very good uses for hash imo (I don't want to bother about collision, 
as unlikely as it is). But using the alias character (and falling back to good 
old string comparison on the class name if there is no alias) seems fine to me.

bq. Sylvain, are you working on this one or would you like me to take a stab at 
it?

I haven't started writing anything so feel free to give it a shot.

 Do something about DynamicCompositeType
 ---

 Key: CASSANDRA-3625
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3625
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne

 Currently, DynamicCompositeType is a super dangerous type. We cannot leave it 
 that way or people will get hurt.
 Let's recall that DynamicCompositeType allows composite column names without 
 any limitation on what each component type can be. It was added to basically 
 allow to use different rows of the same column family to each store a 
 different index. So for instance you would have:
 {noformat}
 index1: {
   bar:24 - someval
   bar:42 - someval
   foo:12 - someval
   ...
 }
 index2: {
   0:uuid1:3.2 - someval
   1:uuid2:2.2 - someval
   ...
 }
 
 {noformat}
 where index1, index2, ... are rows.
 So each row have columns whose names have similar structure (so they can be 
 compared), but between rows the structure can be different (we neve compare 
 two columns from two different rows).
 But the problem is the following: what happens if in the index1 row above, 
 you insert a column whose name is 0:uuid1 ? There is no really meaningful way 
 to compare bar:24 and 0:uuid1. The current implementation of 
 DynamicCompositeType, when confronted with this, says that it is a user error 
 and throw a MarshalException.
 The problem with that is that the exception is not throw at insert time, and 
 it *cannot* be because of the dynamic nature of the comparator. But that 
 means that if you do insert the wrong column in the wrong row, you end up 
 *corrupting* a sstable.
 It is too dangerous a behavior. And it's probably made worst by the fact that 
 some people probably think that DynamicCompositeType should be superior to 
 CompositeType since you know, it's dynamic.
 One solution to that problem could be to decide of some random (but 
 predictable) order between two incomparable component. For example we could 
 design that IntType  LongType  StringType ...
 Note that even if we do that, I would suggest renaming the 
 DynamicCompositeType to something that suggest that CompositeType is always 
 preferable to DynamicCompositeType unless you're really doing very advanced 
 stuffs.
 Opinions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170403#comment-13170403
 ] 

Jonathan Ellis commented on CASSANDRA-3635:
---

If you're i/o bound under size-tiered compaction you're kind of screwed since 
it does such a poor job of actually bucketing the same rows together.

I think we should get some feedback of the here's what my workload like and 
this diminishes my repair pain nature before committing this.  Again, I'm fine 
with posting a 0.8 version of the patch if that helps.

 Throttle validation separately from other compaction
 

 Key: CASSANDRA-3635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 0.8.10, 1.0.7

 Attachments: 0001-separate-validation-throttling.patch


 Validation compaction is fairly ressource intensive. It is possible to 
 throttle it with other compaction, but there is cases where you really want 
 to throttle it rather aggressively but don't necessarily want to have minor 
 compactions throttled that much. The goal is to (optionally) allow to set a 
 separate throttling value for validation.
 PS: I'm not pretending this will solve every repair problem or anything. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170404#comment-13170404
 ] 

Sylvain Lebresne commented on CASSANDRA-3143:
-

Alright. I don't really care about cache reloading either actually. The only 
thing I don't like with the two phase approach is that it populate the cache 
with -1 positions. If for any reason, this doesn't get updated correctly, we'll 
end up having the cache wrongly saying that the key doesn't exists in the 
sstable. Of course there is no reason for the two phase approach to not work, 
but there is part of me that don't like that a simple mess up in the cache 
loading can make some keys unaccessible. Anyway, let's just not have bugs in 
there :)

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction

2011-12-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170406#comment-13170406
 ] 

Sylvain Lebresne commented on CASSANDRA-3635:
-

bq. I think we should get some feedback of the here's what my workload like 
and this diminishes my repair pain nature before committing this.

I'm totally fine with that.

bq. Again, I'm fine with posting a 0.8 version of the patch if that helps.

The currently attached patch is against 0.8.

 Throttle validation separately from other compaction
 

 Key: CASSANDRA-3635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 0.8.10, 1.0.7

 Attachments: 0001-separate-validation-throttling.patch


 Validation compaction is fairly ressource intensive. It is possible to 
 throttle it with other compaction, but there is cases where you really want 
 to throttle it rather aggressively but don't necessarily want to have minor 
 compactions throttled that much. The goal is to (optionally) allow to set a 
 separate throttling value for validation.
 PS: I'm not pretending this will solve every repair problem or anything. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3626) Nodes can get stuck in UP state forever, despite being DOWN

2011-12-15 Thread Peter Schuller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170409#comment-13170409
 ] 

Peter Schuller commented on CASSANDRA-3626:
---

+1. :)

 Nodes can get stuck in UP state forever, despite being DOWN
 ---

 Key: CASSANDRA-3626
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3626
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.8, 1.0.5
Reporter: Peter Schuller
Assignee: Peter Schuller
 Attachments: 3626.txt


 This is a proposed phrasing for an upstream ticket named Newly discovered 
 nodes that are down get stuck in UP state forever (will edit w/ feedback 
 until done):
 We have a observed a problem with gossip which, when you are bootstrapping a 
 new node (or replacing using the replace_token support), any node in the 
 cluster which is Down at the time the node is started, will be assumed to be 
 Up and then *never ever* flapped back to Down until you restart the node.
 This has at least two implications to replacing or bootstrapping new nodes 
 when there are nodes down in the ring:
 * If the new node happens to select a node listed as (UP but in reality is 
 DOWN) as a stream source, streaming will sit there hanging forever.
 * If that doesn't happen (by picking another host), it will instead finish 
 bootstrapping correctly, and begin servicing requests all the while thinking 
 DOWN nodes are UP, and thus routing requests to them, generating timeouts.
 The way to get out of this is to restart the node(s) that you bootstrapped.
 I have tested and confirmed the symptom (that the bootstrapped node things 
 other nodes are Up) using a fairly recent 1.0. The main debugging effort 
 happened on 0.8 however, so all details below refer to 0.8 but are probably 
 similar in 1.0.
 Steps to reproduce:
 * Bring up a cluster of = 3 nodes. *Ensure RF is  N*, so that the cluster 
 is operative with one node removed.
 * Pick two random nodes A, and B. Shut them *both* off.
 * Wait for everyone to realize they are both off (for good measure).
 * Now, take node A and nuke it's data directories and re-start it, such that 
 it comes up w/ normal bootstrap (or use replace_token; didn't test that but 
 should not affect it).
 * Watch how node A starts up, all the while believing node B is down, even 
 though all other nodes in the cluster agree that B is down and B is in fact 
 still turned off.
 The mechanism by which it initially goes into Up state is that the node 
 receives a gossip response from any other node in the cluster, and 
 GossipDigestAck2VerbHandler.doVerb() calls Gossiper.applyStateLocally().
 Gossiper.applyStateLocally() doesn't have any local endpoint state for the 
 cluster, so the else statement at the end (it's a new node) gets triggered 
 and handleMajorStateChange() is called. handleMajorStateChange() always calls 
 markAlive(), unless the state is a dead state (but dead here does not mean 
 not up, but refers to joining/hibernate etc).
 So at this point the node is up in the mind of the node you just bootstrapped.
 Now, in each gossip round doStatusCheck() is called, which iterates over all 
 nodes (including the one falsly Up) and among other things, calls 
 FailureDetector.interpret() on each node.
 FailureDetector.interpret() is meant to update its sense of Phi for the node, 
 and potentially convict it. However there is a short-circuit at the top, 
 whereby if we do not yet have any arrival window for the node, we simply 
 return immediately.
 Arrival intervals are only added as a result of a FailureDetector.report() 
 call, which never happens in this case because the initial endpoint state we 
 added, which came from a remote node that was up, had the latest version of 
 the gossip state (so Gossiper.reportFailureDetector() will never call 
 report()).
 The result is that the node can never ever be convicted.
 Now, let's ignore for a moment the problem that a node that is actually Down 
 will be thought to be Up temporarily for a little while. That is sub-optimal, 
 but let's aim for a fix to the more serious problem in this ticket - which is 
 that is stays up forever.
 Considered solutions:
 * When interpret() gets called and there is no arrival window, we could add a 
 faked arrival window far back in time to cause the node to have history and 
 be marked down. This works in the particular test case. The problem is that 
 since we are not ourselves actively trying to gossip to these nodes with any 
 particular speed, it might take a significant time before we get any kind of 
 confirmation from someone else that it's actually Up in cases where the node 
 actually *is* Up, so it's not clear that this is a good idea.
 * When interpret() gets called and 

[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170410#comment-13170410
 ] 

Pavel Yaskevich commented on CASSANDRA-3143:


How about we just change SSTableReader.getCachedPosition to return null if 
value of the key cache was -1?

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170411#comment-13170411
 ] 

Jonathan Ellis commented on CASSANDRA-3143:
---

I'd rather go with the current approach of leaving the cache empty until we 
have real values for it, and pass SSTR a Set of keys-to-load.

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction

2011-12-15 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170415#comment-13170415
 ] 

Vijay commented on CASSANDRA-3635:
--

I think it will be much better if we can prioritize, long running compaction vs 
normal compaction, lets sat we have 

10MB Compaction limit
2MB Validation compaction limit

2MB is the limit for the validation for a while and when normal compaction 
kicks in we might want to hold the validation and do the compction complete 
because that will affect the read performance and continue with the validation 
compaction after that. by doing this we can set something like 

12MB Compaction limit
6 MB Validation compaction limit

and still be within the HDD limit of 12MB.
The good thing about normal compaction is that it is spread out and not all the 
nodes are not involved in it.

I am starting to think that we can do repairs one by one for a range (within a 
region), so the traffic doesnt get stuck waiting for the IO. Hope it makes 
sense.

 Throttle validation separately from other compaction
 

 Key: CASSANDRA-3635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 0.8.10, 1.0.7

 Attachments: 0001-separate-validation-throttling.patch


 Validation compaction is fairly ressource intensive. It is possible to 
 throttle it with other compaction, but there is cases where you really want 
 to throttle it rather aggressively but don't necessarily want to have minor 
 compactions throttled that much. The goal is to (optionally) allow to set a 
 separate throttling value for validation.
 PS: I'm not pretending this will solve every repair problem or anything. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak

2011-12-15 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170414#comment-13170414
 ] 

Brandon Williams commented on CASSANDRA-3616:
-

CASSANDRA-3532 is the culprit here.

 Temp SSTable and file descriptor leak
 -

 Key: CASSANDRA-3616
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.5
 Environment: 1.0.5 + CASSANDRA-3532 patch
 Solaris 10
Reporter: Eric Parusel

 Discussion about this started in CASSANDRA-3532.  It's on it's own ticket now.
 Anyhow:
 The nodes in my cluster are using a lot of file descriptors, holding open tmp 
 files. A few are using 50K+, nearing their limit (on Solaris, of 64K).
 Here's a small snippet of lsof:
 java 828 appdeployer *162u VREG 181,65540 0 333884 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db
 java 828 appdeployer *163u VREG 181,65540 0 333502 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db
 java 828 appdeployer *165u VREG 181,65540 0 333929 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db
 java 828 appdeployer *166u VREG 181,65540 0 333859 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db
 java 828 appdeployer *167u VREG 181,65540 0 333663 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db
 java 828 appdeployer *168u VREG 181,65540 0 333812 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 I spot checked a few and found they still exist on the filesystem too:
 rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 After more investigation, it seems to happen during a CompactionTask.
 I waited until I saw some -tmp- files hanging around in the data dir:
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Data.db
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Index.db
 and then found this in the logs:
  INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java 
 (line 113) Compacting 
 [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')]
 INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java 
 (line 218) Compacted to 
 [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,].
   83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 
 24.332518MB/s.  Time: 3,288ms.
 Note that the timestamp of the 2nd log line matches the last modified time of 
 the files, and has IDs leading up to, *but not including 788904*.
 I thought this might be relavent information, but I haven't found the 
 specific cause yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170418#comment-13170418
 ] 

Pavel Yaskevich commented on CASSANDRA-3143:


I'm not a fan of that because we would need to drag read keys through all of 
the CFS and SSTableReaders :(

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1214916 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/gms/Gossiper.java

2011-12-15 Thread brandonwilliams
Author: brandonwilliams
Date: Thu Dec 15 19:10:36 2011
New Revision: 1214916

URL: http://svn.apache.org/viewvc?rev=1214916view=rev
Log:
Prevent new nodes from thinking down nodes are up forever.
Patch by brandonwilliams, reviewed by Peter Schuller for CASSANDRA-3626

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1214916r1=1214915r2=1214916view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Thu Dec 15 19:10:36 2011
@@ -1,3 +1,6 @@
+0.8.10
+ * prevent new nodes from thinking down nodes are up forever (CASSANDRA-3626)
+
 0.8.9
  * remove invalid assertion that table was opened before dropping it
(CASSANDRA-3580)

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java?rev=1214916r1=1214915r2=1214916view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java
 Thu Dec 15 19:10:36 2011
@@ -831,7 +831,8 @@ public class Gossiper implements IFailur
 }
 else
 {
-// this is a new node
+// this is a new node, report it to the FD in case it is the 
first time we are seeing it AND it's not alive
+FailureDetector.instance.report(ep);
handleMajorStateChange(ep, remoteState);
 }
 }




[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170417#comment-13170417
 ] 

Jonathan Ellis commented on CASSANDRA-3635:
---

bq. I am starting to think that we can do repairs one by one for a range 
(within a region

You mean if you have replicas A B C, comparing A and B before comparing A and 
C?  The downside there is you now have to validate twice, or they will be too 
out of sync.

 Throttle validation separately from other compaction
 

 Key: CASSANDRA-3635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 0.8.10, 1.0.7

 Attachments: 0001-separate-validation-throttling.patch


 Validation compaction is fairly ressource intensive. It is possible to 
 throttle it with other compaction, but there is cases where you really want 
 to throttle it rather aggressively but don't necessarily want to have minor 
 compactions throttled that much. The goal is to (optionally) allow to set a 
 separate throttling value for validation.
 PS: I'm not pretending this will solve every repair problem or anything. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1214918 - in /cassandra/branches/cassandra-1.0: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/gms/

2011-12-15 Thread brandonwilliams
Author: brandonwilliams
Date: Thu Dec 15 19:14:28 2011
New Revision: 1214918

URL: http://svn.apache.org/viewvc?rev=1214918view=rev
Log:
Merge 3626 from 0.8

Modified:
cassandra/branches/cassandra-1.0/   (props changed)
cassandra/branches/cassandra-1.0/CHANGES.txt
cassandra/branches/cassandra-1.0/contrib/   (props changed)

cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)

cassandra/branches/cassandra-1.0/src/java/org/apache/cassandra/gms/Gossiper.java

Propchange: cassandra/branches/cassandra-1.0/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Thu Dec 15 19:14:28 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291
 /cassandra/branches/cassandra-0.7:1026516-1211709
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
-/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1212854,1212938
+/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1212854,1212938,1214916
 /cassandra/branches/cassandra-0.8.0:1125021-1130369
 /cassandra/branches/cassandra-0.8.1:1101014-1125018
 /cassandra/branches/cassandra-1.0:1167106,1167185

Modified: cassandra/branches/cassandra-1.0/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-1.0/CHANGES.txt?rev=1214918r1=1214917r2=1214918view=diff
==
--- cassandra/branches/cassandra-1.0/CHANGES.txt (original)
+++ cassandra/branches/cassandra-1.0/CHANGES.txt Thu Dec 15 19:14:28 2011
@@ -2,6 +2,8 @@
  * fix assertion when dropping a columnfamily with no sstables (CASSANDRA-3614)
  * more efficient allocation of small bloom filters (CASSANDRA-3618)
  * CLibrary.createHardLinkWithExec() to check for errors (CASSANDRA-3101)
+Merged from 0.8:
+ * prevent new nodes from thinking down nodes are up forever (CASSANDRA-3626)
 
 1.0.6
  * (CQL) fix cqlsh support for replicate_on_write (CASSANDRA-3596)

Propchange: cassandra/branches/cassandra-1.0/contrib/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Thu Dec 15 19:14:28 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009
 /cassandra/branches/cassandra-0.7/contrib:1026516-1211709
 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654
-/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1212854,1212938
+/cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1212854,1212938,1214916
 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369
 /cassandra/branches/cassandra-0.8.1/contrib:1101014-1125018
 /cassandra/branches/cassandra-1.0/contrib:1167106,1167185

Propchange: 
cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Thu Dec 15 19:14:28 2011
@@ -1,7 +1,7 @@
 
/cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1131291
 
/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1211709
 
/cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654
-/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125019-1212854,1212938
+/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090934-1125013,1125019-1212854,1212938,1214916
 
/cassandra/branches/cassandra-0.8.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1125021-1130369
 
/cassandra/branches/cassandra-0.8.1/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1101014-1125018
 
/cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1167106,1167185

Propchange: 
cassandra/branches/cassandra-1.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Thu Dec 

[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170423#comment-13170423
 ] 

Jonathan Ellis commented on CASSANDRA-3143:
---

Again, that's what we're doing now, so I don't see it as *that* big a deal.  
But I'm good with either that approach, or save-the-values-also approach.  I 
agree with Sylvain that keeping invalid values in the cache and replacing them 
later is a bad idea.

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.

2011-12-15 Thread Edward Capriolo (Created) (JIRA)
Dynamic Snitch does not compute scores if no direct reads hit the node.
---

 Key: CASSANDRA-3640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640
 Project: Cassandra
  Issue Type: Bug
Reporter: Edward Capriolo
Priority: Minor


We can into an interesting situation. We added 2 nodes to our cluster. 
Strangely this node performed worse then other nodes. It had more IOwait for 
example. The impact was not major but it was noticeable. Later I determined 
that this Cassandra node was not in our client's list of nodes and our clients 
do not auto discover. I confirmed that the host did not have any scores inside 
it's dynamic snitch.

It is counter intuitive that a node receiving less or no direct user requests 
would perform worse then others. I am not sure of the dynamic that caused this. 

I understand that DSnitch is supposed to have it's own view of the world, maybe 
it could share information with neighbours. Again this is more of a client 
configuration issue then a direct Cassandra issue, but I found it quite 
interesting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3626) Nodes can get stuck in UP state forever, despite being DOWN

2011-12-15 Thread Brandon Williams (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-3626:


 Reviewer: scode  (was: lenn0x)
Fix Version/s: 1.0.7
   0.8.10
 Assignee: Brandon Williams  (was: Peter Schuller)

 Nodes can get stuck in UP state forever, despite being DOWN
 ---

 Key: CASSANDRA-3626
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3626
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.8, 1.0.5
Reporter: Peter Schuller
Assignee: Brandon Williams
 Fix For: 0.8.10, 1.0.7

 Attachments: 3626.txt


 This is a proposed phrasing for an upstream ticket named Newly discovered 
 nodes that are down get stuck in UP state forever (will edit w/ feedback 
 until done):
 We have a observed a problem with gossip which, when you are bootstrapping a 
 new node (or replacing using the replace_token support), any node in the 
 cluster which is Down at the time the node is started, will be assumed to be 
 Up and then *never ever* flapped back to Down until you restart the node.
 This has at least two implications to replacing or bootstrapping new nodes 
 when there are nodes down in the ring:
 * If the new node happens to select a node listed as (UP but in reality is 
 DOWN) as a stream source, streaming will sit there hanging forever.
 * If that doesn't happen (by picking another host), it will instead finish 
 bootstrapping correctly, and begin servicing requests all the while thinking 
 DOWN nodes are UP, and thus routing requests to them, generating timeouts.
 The way to get out of this is to restart the node(s) that you bootstrapped.
 I have tested and confirmed the symptom (that the bootstrapped node things 
 other nodes are Up) using a fairly recent 1.0. The main debugging effort 
 happened on 0.8 however, so all details below refer to 0.8 but are probably 
 similar in 1.0.
 Steps to reproduce:
 * Bring up a cluster of = 3 nodes. *Ensure RF is  N*, so that the cluster 
 is operative with one node removed.
 * Pick two random nodes A, and B. Shut them *both* off.
 * Wait for everyone to realize they are both off (for good measure).
 * Now, take node A and nuke it's data directories and re-start it, such that 
 it comes up w/ normal bootstrap (or use replace_token; didn't test that but 
 should not affect it).
 * Watch how node A starts up, all the while believing node B is down, even 
 though all other nodes in the cluster agree that B is down and B is in fact 
 still turned off.
 The mechanism by which it initially goes into Up state is that the node 
 receives a gossip response from any other node in the cluster, and 
 GossipDigestAck2VerbHandler.doVerb() calls Gossiper.applyStateLocally().
 Gossiper.applyStateLocally() doesn't have any local endpoint state for the 
 cluster, so the else statement at the end (it's a new node) gets triggered 
 and handleMajorStateChange() is called. handleMajorStateChange() always calls 
 markAlive(), unless the state is a dead state (but dead here does not mean 
 not up, but refers to joining/hibernate etc).
 So at this point the node is up in the mind of the node you just bootstrapped.
 Now, in each gossip round doStatusCheck() is called, which iterates over all 
 nodes (including the one falsly Up) and among other things, calls 
 FailureDetector.interpret() on each node.
 FailureDetector.interpret() is meant to update its sense of Phi for the node, 
 and potentially convict it. However there is a short-circuit at the top, 
 whereby if we do not yet have any arrival window for the node, we simply 
 return immediately.
 Arrival intervals are only added as a result of a FailureDetector.report() 
 call, which never happens in this case because the initial endpoint state we 
 added, which came from a remote node that was up, had the latest version of 
 the gossip state (so Gossiper.reportFailureDetector() will never call 
 report()).
 The result is that the node can never ever be convicted.
 Now, let's ignore for a moment the problem that a node that is actually Down 
 will be thought to be Up temporarily for a little while. That is sub-optimal, 
 but let's aim for a fix to the more serious problem in this ticket - which is 
 that is stays up forever.
 Considered solutions:
 * When interpret() gets called and there is no arrival window, we could add a 
 faked arrival window far back in time to cause the node to have history and 
 be marked down. This works in the particular test case. The problem is that 
 since we are not ourselves actively trying to gossip to these nodes with any 
 particular speed, it might take a significant time before we get any kind of 
 confirmation from someone else that it's actually Up in cases 

[jira] [Commented] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170427#comment-13170427
 ] 

Jonathan Ellis commented on CASSANDRA-3640:
---

I think I misunderstood on irc.  The dynamic snitch is populated based on 
client requests, so it's normal for that to be empty here.  It sounds like the 
real question is, were other nodes directing requests away from the poorly 
performing ones, and if not, what did their dsnitch contents look like?

 Dynamic Snitch does not compute scores if no direct reads hit the node.
 ---

 Key: CASSANDRA-3640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640
 Project: Cassandra
  Issue Type: Bug
Reporter: Edward Capriolo
Priority: Minor

 We can into an interesting situation. We added 2 nodes to our cluster. 
 Strangely this node performed worse then other nodes. It had more IOwait for 
 example. The impact was not major but it was noticeable. Later I determined 
 that this Cassandra node was not in our client's list of nodes and our 
 clients do not auto discover. I confirmed that the host did not have any 
 scores inside it's dynamic snitch.
 It is counter intuitive that a node receiving less or no direct user requests 
 would perform worse then others. I am not sure of the dynamic that caused 
 this. 
 I understand that DSnitch is supposed to have it's own view of the world, 
 maybe it could share information with neighbours. Again this is more of a 
 client configuration issue then a direct Cassandra issue, but I found it 
 quite interesting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.

2011-12-15 Thread Edward Capriolo (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated CASSANDRA-3640:
---

Affects Version/s: 0.8.7
   Issue Type: Improvement  (was: Bug)

 Dynamic Snitch does not compute scores if no direct reads hit the node.
 ---

 Key: CASSANDRA-3640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.7
Reporter: Edward Capriolo
Priority: Minor

 We can into an interesting situation. We added 2 nodes to our cluster. 
 Strangely this node performed worse then other nodes. It had more IOwait for 
 example. The impact was not major but it was noticeable. Later I determined 
 that this Cassandra node was not in our client's list of nodes and our 
 clients do not auto discover. I confirmed that the host did not have any 
 scores inside it's dynamic snitch.
 It is counter intuitive that a node receiving less or no direct user requests 
 would perform worse then others. I am not sure of the dynamic that caused 
 this. 
 I understand that DSnitch is supposed to have it's own view of the world, 
 maybe it could share information with neighbours. Again this is more of a 
 client configuration issue then a direct Cassandra issue, but I found it 
 quite interesting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.

2011-12-15 Thread Edward Capriolo (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated CASSANDRA-3640:
---

Description: 
We can into an interesting situation. We added 2 nodes to our cluster. 
Strangely these nodes were performing worse then other nodes. They had more 
IOwait for example. The impact was not major but it was noticeable. Later I 
determined that these Cassandra node were not in our client's list of nodes and 
our clients do not auto discover. I confirmed that the hosts did not have any 
scores inside it's dynamic snitch.

It is counter intuitive that a node receiving less or no direct user requests 
would perform worse then others. I am not sure of the dynamic that caused this. 

I understand that DSnitch is supposed to have it's own view of the world, maybe 
it could share information with neighbours. Again this is more of a client 
configuration issue then a direct Cassandra issue, but I found it interesting.

  was:
We can into an interesting situation. We added 2 nodes to our cluster. 
Strangely this node performed worse then other nodes. It had more IOwait for 
example. The impact was not major but it was noticeable. Later I determined 
that this Cassandra node was not in our client's list of nodes and our clients 
do not auto discover. I confirmed that the host did not have any scores inside 
it's dynamic snitch.

It is counter intuitive that a node receiving less or no direct user requests 
would perform worse then others. I am not sure of the dynamic that caused this. 

I understand that DSnitch is supposed to have it's own view of the world, maybe 
it could share information with neighbours. Again this is more of a client 
configuration issue then a direct Cassandra issue, but I found it quite 
interesting.


 Dynamic Snitch does not compute scores if no direct reads hit the node.
 ---

 Key: CASSANDRA-3640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.7
Reporter: Edward Capriolo
Priority: Minor

 We can into an interesting situation. We added 2 nodes to our cluster. 
 Strangely these nodes were performing worse then other nodes. They had more 
 IOwait for example. The impact was not major but it was noticeable. Later I 
 determined that these Cassandra node were not in our client's list of nodes 
 and our clients do not auto discover. I confirmed that the hosts did not have 
 any scores inside it's dynamic snitch.
 It is counter intuitive that a node receiving less or no direct user requests 
 would perform worse then others. I am not sure of the dynamic that caused 
 this. 
 I understand that DSnitch is supposed to have it's own view of the world, 
 maybe it could share information with neighbours. Again this is more of a 
 client configuration issue then a direct Cassandra issue, but I found it 
 interesting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




svn commit: r1214927 - in /cassandra/trunk: ./ contrib/ debian/ interface/thrift/gen-java/org/apache/cassandra/thrift/ pylib/cqlshlib/ src/java/org/apache/cassandra/db/ src/java/org/apache/cassandra/d

2011-12-15 Thread jbellis
Author: jbellis
Date: Thu Dec 15 19:34:40 2011
New Revision: 1214927

URL: http://svn.apache.org/viewvc?rev=1214927view=rev
Log:
merge from 1.0

Added:
cassandra/trunk/debian/cassandra-sysctl.conf
Removed:
cassandra/trunk/test/distributed/README.txt

cassandra/trunk/test/distributed/org/apache/cassandra/CassandraServiceController.java
cassandra/trunk/test/distributed/org/apache/cassandra/CountersTest.java
cassandra/trunk/test/distributed/org/apache/cassandra/MovementTest.java
cassandra/trunk/test/distributed/org/apache/cassandra/MutationTest.java
cassandra/trunk/test/distributed/org/apache/cassandra/TestBase.java
cassandra/trunk/test/distributed/org/apache/cassandra/utils/BlobUtils.java
cassandra/trunk/test/distributed/org/apache/cassandra/utils/KeyPair.java
Modified:
cassandra/trunk/   (props changed)
cassandra/trunk/.rat-excludes
cassandra/trunk/CHANGES.txt
cassandra/trunk/NEWS.txt
cassandra/trunk/build.xml
cassandra/trunk/contrib/   (props changed)
cassandra/trunk/debian/changelog

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java
   (props changed)

cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java
   (props changed)
cassandra/trunk/pylib/cqlshlib/cqlhandling.py
cassandra/trunk/src/java/org/apache/cassandra/db/DataTracker.java
cassandra/trunk/src/java/org/apache/cassandra/db/SystemTable.java
cassandra/trunk/src/java/org/apache/cassandra/db/migration/DropKeyspace.java
cassandra/trunk/src/java/org/apache/cassandra/gms/Gossiper.java
cassandra/trunk/src/java/org/apache/cassandra/gms/GossiperMBean.java
cassandra/trunk/src/java/org/apache/cassandra/service/StorageProxy.java
cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java
cassandra/trunk/test/cassandra.in.sh
cassandra/trunk/test/unit/org/apache/cassandra/service/RemoveTest.java

Propchange: cassandra/trunk/
--
--- svn:mergeinfo (original)
+++ svn:mergeinfo Thu Dec 15 19:34:40 2011
@@ -1,11 +1,12 @@
 
/cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1131291
 /cassandra/branches/cassandra-0.7:1026516-1211709
 /cassandra/branches/cassandra-0.7.0:1053690-1055654
-/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1198724,1198726-1206097,1206099-1211976
+/cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1198724,1198726-1206097,1206099-1212854,1212938
 /cassandra/branches/cassandra-0.8.0:1125021-1130369
 /cassandra/branches/cassandra-0.8.1:1101014-1125018
-/cassandra/branches/cassandra-1.0:1167085-1211978,1212284,1213775
+/cassandra/branches/cassandra-1.0:1167085-1213775
 
/cassandra/branches/cassandra-1.0.0:1167104-1167229,1167232-1181093,1181741,1181816,1181820,1182951,1183243
+/cassandra/branches/cassandra-1.0.5:1208016
 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689
 /cassandra/tags/cassandra-0.8.0-rc1:1102511-1125020
 /incubator/cassandra/branches/cassandra-0.3:774578-796573

Modified: cassandra/trunk/.rat-excludes
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/.rat-excludes?rev=1214927r1=1214926r2=1214927view=diff
==
--- cassandra/trunk/.rat-excludes (original)
+++ cassandra/trunk/.rat-excludes Thu Dec 15 19:34:40 2011
@@ -29,3 +29,4 @@ drivers/txpy/txcql/cassandra/*
 drivers/py/cql/cassandra/*
 doc/cql/CQL*
 build.properties.default
+test/data/legacy-sstables/**

Modified: cassandra/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1214927r1=1214926r2=1214927view=diff
==
--- cassandra/trunk/CHANGES.txt (original)
+++ cassandra/trunk/CHANGES.txt Thu Dec 15 19:34:40 2011
@@ -27,7 +27,12 @@
  * more efficient allocation of small bloom filters (CASSANDRA-3618)
 
 
+1.0.7
+ * fix assertion when dropping a columnfamily with no sstables (CASSANDRA-3614)
+
+
 1.0.6
+ * (CQL) fix cqlsh support for replicate_on_write (CASSANDRA-3596)
  * fix adding to leveled manifest after streaming (CASSANDRA-3536)
  * filter out unavailable cipher suites when using encryption (CASSANDRA-3178)
  * (HADOOP) add old-style api support for CFIF and CFRR (CASSANDRA-2799)
@@ -49,13 +54,20 @@
  * add back partitioner to sstable metadata (CASSANDRA-3540)
  * fix NPE in get_count for counters (CASSANDRA-3601)
 Merged from 0.8:
+ * remove invalid assertion that table was opened before dropping it
+   (CASSANDRA-3580)
+ * range and 

[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170438#comment-13170438
 ] 

Pavel Yaskevich commented on CASSANDRA-3143:


We are doing that now because we are able to read caches independently for each 
of the CFS, but with a global cache we would need to load that set on cache 
init and keep it through Schema.load as I global state, why wouldn't changing 
SSTableReader.getCachedPosition to return null (and delete that key) if value 
was -1, be a path of least resistance in this case?

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3616) Temp SSTable and file descriptor leak

2011-12-15 Thread Sylvain Lebresne (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-3616:


Attachment: 3616.patch

I believe the problem is that when compaction is over we were creating an empty 
(and useless) writer. Previously we were just deleting it right away because 
the 'cleanupIfNeeded' was in the finally.

Patch attached to not create it in the first place.

 Temp SSTable and file descriptor leak
 -

 Key: CASSANDRA-3616
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.5
 Environment: 1.0.5 + CASSANDRA-3532 patch
 Solaris 10
Reporter: Eric Parusel
 Attachments: 3616.patch


 Discussion about this started in CASSANDRA-3532.  It's on it's own ticket now.
 Anyhow:
 The nodes in my cluster are using a lot of file descriptors, holding open tmp 
 files. A few are using 50K+, nearing their limit (on Solaris, of 64K).
 Here's a small snippet of lsof:
 java 828 appdeployer *162u VREG 181,65540 0 333884 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db
 java 828 appdeployer *163u VREG 181,65540 0 333502 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db
 java 828 appdeployer *165u VREG 181,65540 0 333929 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db
 java 828 appdeployer *166u VREG 181,65540 0 333859 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db
 java 828 appdeployer *167u VREG 181,65540 0 333663 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db
 java 828 appdeployer *168u VREG 181,65540 0 333812 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 I spot checked a few and found they still exist on the filesystem too:
 rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 After more investigation, it seems to happen during a CompactionTask.
 I waited until I saw some -tmp- files hanging around in the data dir:
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Data.db
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Index.db
 and then found this in the logs:
  INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java 
 (line 113) Compacting 
 [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')]
 INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java 
 (line 218) Compacted to 
 [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,].
   83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 
 24.332518MB/s.  Time: 3,288ms.
 Note that the timestamp of the 2nd log line matches the last modified time of 
 the files, and has IDs leading up to, *but not including 788904*.
 I thought this might be relavent information, but I haven't found the 
 specific cause yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170446#comment-13170446
 ] 

Jonathan Ellis commented on CASSANDRA-3143:
---

Also note that while we have one global cache internally, there's nothing 
stopping us from splitting out the different CFs to different save files.  In 
fact that would be great from a backwards compatibility point of view; there's 
users out there who would really hate to blow away their cache on upgrade, and 
preserving the save format would avoid the need for a backwards compatibility 
mode.

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch, 
 0007-second-round-of-changes-according-to-Sylvain-comment.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-3143:
---

Attachment: (was: 
0007-second-round-of-changes-according-to-Sylvain-comment.patch)

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3143) Global caches (key/row)

2011-12-15 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170450#comment-13170450
 ] 

Pavel Yaskevich commented on CASSANDRA-3143:


Yeah, I guess this is the best way to go, I will remove #7 patch and re-attach 
with those changes to avoid pre-loading keys as well as keeping a global state.

 Global caches (key/row)
 ---

 Key: CASSANDRA-3143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3143
 Project: Cassandra
  Issue Type: Improvement
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
Priority: Minor
  Labels: Core
 Fix For: 1.1

 Attachments: 0001-global-key-cache.patch, 
 0002-global-row-cache-and-ASC.readSaved-changed-to-abstra.patch, 
 0003-CacheServiceMBean-and-correct-key-cache-loading.patch, 
 0004-key-row-cache-tests-and-tweaks.patch, 
 0005-cleanup-of-the-CFMetaData-and-thrift-avro-CfDef-and-.patch, 
 0006-row-key-cache-improvements-according-to-Sylvain-s-co.patch


 Caches are difficult to configure well as ColumnFamilies are added, similar 
 to how memtables were difficult pre-CASSANDRA-2006.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-1537) Add option (on CF) to remove expired column on minor compactions

2011-12-15 Thread Jonathan Ellis (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-1537.
---

   Resolution: Won't Fix
Fix Version/s: (was: 1.1)
 Assignee: (was: Sylvain Lebresne)

This doesn't seem urgent or useful enough to justify adding more options and 
complexity to the TTL code.

 Add option (on CF) to remove expired column on minor compactions
 

 Key: CASSANDRA-1537
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1537
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.1
Reporter: Sylvain Lebresne
Priority: Minor
   Original Estimate: 8h
  Remaining Estimate: 8h

 In some use cases, you can safely remove the tombstones of an expired column.
 In theory, this is true in each case where you know that you will never 
 update a column 
 using a ttl strictly lesser that the one of the old column.
 This will be the case for instance if you always use the same ttl on all the 
 columns of a CF
 (say you use the CF for a long term persistent cache).
 I propose adding an option (by CF) that says 'always remove tombstone of 
 expired columns
 for that CF'.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-2056) Need a way of flattening schemas.

2011-12-15 Thread Jonathan Ellis (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2056.
---

   Resolution: Invalid
Fix Version/s: (was: 1.1)
 Assignee: (was: Gary Dusbabek)

This is obsolete post-CASSANDRA-1391

 Need a way of flattening schemas.
 -

 Key: CASSANDRA-2056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2056
 Project: Cassandra
  Issue Type: Improvement
Reporter: Gary Dusbabek
Priority: Minor
 Attachments: v2-0001-convert-MigrationManager-into-a-singleton.txt, 
 v2-0002-bail-on-migrations-originating-from-newer-protocol-ver.txt, 
 v2-0003-a-way-to-upgrade-schema-when-protocol-version-changes.txt


 For all of our trying not to, we still managed to screw this up.  Schema 
 updates currently contain a serialized RowMutation stored as a column value.  
 When a node needs updated schema, it requests these values, deserializes them 
 and applies them.  As the serialization scheme for RowMutation changes over 
 time (this is inevitable), those old migrations will become incompatible with 
 newer implementations of the RowMutation deserializer.  This means that when 
 new nodes come online, they'll get migration messages that they have trouble 
 deserializing.  (Remember, we've only made the promise that we'll be 
 backwards compatible for one version--see CASSANDRA-1015--even though we'd 
 eventually have this problem without that guarantee.)
 What I propose is a cluster command to flatten the schema prior to upgrading. 
  This would basically purge the old schema updates and replace them with a 
 single serialized migration (serialized in the current protocol version).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170457#comment-13170457
 ] 

Jonathan Ellis commented on CASSANDRA-2261:
---

I don't suppose you'd care to rebase to trunk?

 During Compaction, Corrupt SSTables with rows that cause failures should be 
 identified and blacklisted.
 ---

 Key: CASSANDRA-2261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Benjamin Coverston
Priority: Minor
  Labels: not_a_pony
 Fix For: 1.1

 Attachments: 2261.patch


 When a compaction of a set of SSTables fails because of corruption it will 
 continue to try to compact that SSTable causing pending compactions to build 
 up.
 One way to mitigate this problem would be to log the error, then identify the 
 specific SSTable that caused the failure, subsequently blacklisting that 
 SSTable and ensuring that it is no longer included in future compactions. For 
 this we could simply store the problematic SSTable's name in memory.
 If it's not possible to identify the SSTable that caused the issue, then 
 perhaps blacklisting the (ordered) permutation of SSTables to be compacted 
 together is something that can be done to solve this problem in a more 
 general case, and avoid issues where two (or more) SSTables have trouble 
 compacting a particular row. For this option we would probably want to store 
 the lists of the bad combinations in the system table somewhere s.t. these 
 can survive a node failure (there have been a few cases where I have seen a 
 compaction cause a node failure).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-2876) JDBC 1.1 Roadmap of Enhancements

2011-12-15 Thread Jonathan Ellis (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2876.
---

Resolution: Fixed
  Assignee: Rick Shaw

Resolving as fixed since the subtasks are, but really JDBC driver moved 
out-of-tree anyway.

 JDBC 1.1 Roadmap of Enhancements
 

 Key: CASSANDRA-2876
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2876
 Project: Cassandra
  Issue Type: Improvement
  Components: Drivers
Affects Versions: 0.8.1
Reporter: Rick Shaw
Assignee: Rick Shaw
Priority: Minor
  Labels: cql, jdbc
 Fix For: 1.1


 Organizational ticket to tie together the proposed improvements to 
 Cassandra's JDBC driver  in order to coincide with the 1.0 release of the 
 server-side product in the fall of 2011.
 The target list of improvements (in no particular order for the moment) are 
 as follows:
 # Complete the {{PreparedStatement}} functionality by implementing true 
 server side variable binding against pre-compiled CQL references.
 # Provide simple {{DataSource}} Support.
 # Provide a full {{PooledDataSource}} implementation that integrates the C* 
 JDBC driver with App Servers, JPA implementations and POJO Frameworks (like 
 Spring).
 # Add the {{BigDecimal}} datatype to the list of {{AbstractType}} classes to 
 complete the planned datatype support for {{PreparedStatement}} and 
 {{ResultSet}}.
 # Enhance the {{Driver}} features to support automatic error recovery and 
 reconnection.
 # Support {{RowId}} in {{ResultSet}}
 # Allow bi-directional row access scrolling  to complete functionality in the 
 {{ResultSet}}.
 # Deliver unit tests for each of the major components of the suite.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3024) sstable and message varint encoding

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170459#comment-13170459
 ] 

Jonathan Ellis commented on CASSANDRA-3024:
---

Are you still working on a patch for this, Terje?

 sstable and message varint encoding
 ---

 Key: CASSANDRA-3024
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3024
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1


 We could save some sstable space by encoding longs and ints as vlong and 
 vint, respectively.  (Probably most short lengths would be better as vint 
 as well.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge

2011-12-15 Thread Peter Schuller (Created) (JIRA)
inconsistent/corrupt counters w/ broken shards never converge
-

 Key: CASSANDRA-3641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3641
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller


We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had 
counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption 
was that there would exist shards with the *same* node_id, *same* clock id, but 
*different* counts.

The counter column diffing and reconciliation code assumes that this never 
happens, and ignores the count. The problem with this is that if there is an 
inconsistency, the result of a reconciliation will depend on the order of the 
shards.

In our case for example, we would see the value of the counter randomly 
fluctuating on a CL.ALL read, but we would get consistent (whatever the node 
had) on CL.ONE (submitted to one of the nodes in the replica set for the key).

In addition, read repair would not work despite digest mismatches because the 
diffing algorithm also did not care about the counts when determining the 
differences to send.

I'm attaching patches that fixes this. The first patch is against our 0.8 
branch, which is not terribly useful to people, but I include it because it is 
the well-tested version that we have used on the production cluster which was 
subject to this corruption.

The other patch is against trunk, and contains the same change.

What the patch does is:

* On diffing, treat as DISJOINT if there is a count discrepancy.
* On reconciliation, look at the count and *deterministically* pick the higher 
one, and:
** log the fact that we detected a corrupt counter
** increment a JMX observable counter for monitoring purposes

A cluster which is subject to such corruption and has this patch, will fix 
itself with and AES + compact (or just repeated compactions assuming the 
replicate-on-compact is able to deliver correctly).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.

2011-12-15 Thread Benjamin Coverston (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170461#comment-13170461
 ] 

Benjamin Coverston commented on CASSANDRA-2261:
---

happily

 During Compaction, Corrupt SSTables with rows that cause failures should be 
 identified and blacklisted.
 ---

 Key: CASSANDRA-2261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Benjamin Coverston
Priority: Minor
  Labels: not_a_pony
 Fix For: 1.1

 Attachments: 2261.patch


 When a compaction of a set of SSTables fails because of corruption it will 
 continue to try to compact that SSTable causing pending compactions to build 
 up.
 One way to mitigate this problem would be to log the error, then identify the 
 specific SSTable that caused the failure, subsequently blacklisting that 
 SSTable and ensuring that it is no longer included in future compactions. For 
 this we could simply store the problematic SSTable's name in memory.
 If it's not possible to identify the SSTable that caused the issue, then 
 perhaps blacklisting the (ordered) permutation of SSTables to be compacted 
 together is something that can be done to solve this problem in a more 
 general case, and avoid issues where two (or more) SSTables have trouble 
 compacting a particular row. For this option we would probably want to store 
 the lists of the bad combinations in the system table somewhere s.t. these 
 can survive a node failure (there have been a few cases where I have seen a 
 compaction cause a node failure).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3389) Evaluate CSLM alternatives for improved cache or GC performance

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170462#comment-13170462
 ] 

Jonathan Ellis commented on CASSANDRA-3389:
---

Can you test these under G1 garbage collector?  CSLM is the main reason G1 
works poorly for us.  (Especially the uses in Memtable.)

 Evaluate CSLM alternatives for improved cache or GC performance
 ---

 Key: CASSANDRA-3389
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3389
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Brandon Williams
Priority: Minor
 Fix For: 1.1

 Attachments: 0001-Replace-CSLM-with-ConcurrentSkipTreeMap.patch, 
 0001-Switch-CSLM-to-SnapTree.patch


 Ben Manes commented on 
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance that 
 it's worth evaluating https://github.com/mspiegel/lockfreeskiptree and 
 https://github.com/nbronson/snaptree as CSLM replacements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge

2011-12-15 Thread Peter Schuller (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller updated CASSANDRA-3641:
--

Attachment: 3641-0.8-internal-not-for-inclusion.txt
3641-trunk.txt

 inconsistent/corrupt counters w/ broken shards never converge
 -

 Key: CASSANDRA-3641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3641
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
 Attachments: 3641-0.8-internal-not-for-inclusion.txt, 3641-trunk.txt


 We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had 
 counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption 
 was that there would exist shards with the *same* node_id, *same* clock id, 
 but *different* counts.
 The counter column diffing and reconciliation code assumes that this never 
 happens, and ignores the count. The problem with this is that if there is an 
 inconsistency, the result of a reconciliation will depend on the order of the 
 shards.
 In our case for example, we would see the value of the counter randomly 
 fluctuating on a CL.ALL read, but we would get consistent (whatever the node 
 had) on CL.ONE (submitted to one of the nodes in the replica set for the key).
 In addition, read repair would not work despite digest mismatches because the 
 diffing algorithm also did not care about the counts when determining the 
 differences to send.
 I'm attaching patches that fixes this. The first patch is against our 0.8 
 branch, which is not terribly useful to people, but I include it because it 
 is the well-tested version that we have used on the production cluster which 
 was subject to this corruption.
 The other patch is against trunk, and contains the same change.
 What the patch does is:
 * On diffing, treat as DISJOINT if there is a count discrepancy.
 * On reconciliation, look at the count and *deterministically* pick the 
 higher one, and:
 ** log the fact that we detected a corrupt counter
 ** increment a JMX observable counter for monitoring purposes
 A cluster which is subject to such corruption and has this patch, will fix 
 itself with and AES + compact (or just repeated compactions assuming the 
 replicate-on-compact is able to deliver correctly).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-3641) inconsistent/corrupt counters w/ broken shards never converge

2011-12-15 Thread Peter Schuller (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schuller reassigned CASSANDRA-3641:
-

Assignee: Peter Schuller

 inconsistent/corrupt counters w/ broken shards never converge
 -

 Key: CASSANDRA-3641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3641
 Project: Cassandra
  Issue Type: Bug
Reporter: Peter Schuller
Assignee: Peter Schuller
 Attachments: 3641-0.8-internal-not-for-inclusion.txt, 3641-trunk.txt


 We ran into a case (which MIGHT be related to CASSANDRA-3070) whereby we had 
 counters that were corrupt (hopefully due to CASSANDRA-3178). The corruption 
 was that there would exist shards with the *same* node_id, *same* clock id, 
 but *different* counts.
 The counter column diffing and reconciliation code assumes that this never 
 happens, and ignores the count. The problem with this is that if there is an 
 inconsistency, the result of a reconciliation will depend on the order of the 
 shards.
 In our case for example, we would see the value of the counter randomly 
 fluctuating on a CL.ALL read, but we would get consistent (whatever the node 
 had) on CL.ONE (submitted to one of the nodes in the replica set for the key).
 In addition, read repair would not work despite digest mismatches because the 
 diffing algorithm also did not care about the counts when determining the 
 differences to send.
 I'm attaching patches that fixes this. The first patch is against our 0.8 
 branch, which is not terribly useful to people, but I include it because it 
 is the well-tested version that we have used on the production cluster which 
 was subject to this corruption.
 The other patch is against trunk, and contains the same change.
 What the patch does is:
 * On diffing, treat as DISJOINT if there is a count discrepancy.
 * On reconciliation, look at the count and *deterministically* pick the 
 higher one, and:
 ** log the fact that we detected a corrupt counter
 ** increment a JMX observable counter for monitoring purposes
 A cluster which is subject to such corruption and has this patch, will fix 
 itself with and AES + compact (or just repeated compactions assuming the 
 replicate-on-compact is able to deliver correctly).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3213) Upgrade Thrift

2011-12-15 Thread Jonathan Ellis (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3213:
--

Summary: Upgrade Thrift  (was: Upgrade Thrift to 0.7.0)

 Upgrade Thrift
 --

 Key: CASSANDRA-3213
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3213
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jake Farrell
Assignee: Jake Farrell
Priority: Trivial
  Labels: thrift
 Fix For: 1.2

 Attachments: v1-0001-update-generated-thrift-code.patch, 
 v1-0002-upgrade-thrift-jar-and-license.patch, v1-0003-update-build-xml.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2319) Promote row index

2011-12-15 Thread Jonathan Ellis (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2319:
--

Fix Version/s: (was: 1.1)

 Promote row index
 -

 Key: CASSANDRA-2319
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
  Labels: compression, index, timeseries
 Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt, 
 version-g-lzf.txt, version-g.txt


 The row index contains entries for configurably sized blocks of a wide row. 
 For a row of appreciable size, the row index ends up directing the third seek 
 (1. index, 2. row index, 3. content) to nearby the first column of a scan.
 Since the row index is always used for wide rows, and since it contains 
 information that tells us whether or not the 3rd seek is necessary (the 
 column range or name we are trying to slice may not exist in a given 
 sstable), promoting the row index into the sstable index would allow us to 
 drop the maximum number of seeks for wide rows back to 2, and, more 
 importantly, would allow sstables to be eliminated using only the index.
 An example usecase that benefits greatly from this change is time series data 
 in wide rows, where data is appended to the beginning or end of the row. Our 
 existing compaction strategy gets lucky and clusters the oldest data in the 
 oldest sstables: for queries to recently appended data, we would be able to 
 eliminate wide rows using only the sstable index, rather than needing to seek 
 into the data file to determine that it isn't interesting. For narrow rows, 
 this change would have no effect, as they will not reach the threshold for 
 indexing anyway.
 A first cut design for this change would look very similar to the file format 
 design proposed on #674: 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered, 
 column names clustered, and offsets clustered and delta encoded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2398) Type specific compression

2011-12-15 Thread Jonathan Ellis (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2398:
--

Fix Version/s: (was: 1.2)

 Type specific compression
 -

 Key: CASSANDRA-2398
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2398
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Stu Hood
  Labels: compression
 Attachments: 
 0001-CASSANDRA-2398-Add-type-specific-compression-to-Abstra.txt, 
 0002-CASSANDRA-2398-Type-specific-compression-for-counters.txt, 
 compress-lzf-0.7.0.jar


 Cassandra has a lot of locations that are ripe for type specific compression. 
 A short list:
 Indexes
  * Keys compressed as BytesType, which could default to LZO/LZMA
  * Offsets (delta and varint encoding)
  * Column names added by 2319
 Data
  * Keys, columns, timestamps: see 
 http://wiki.apache.org/cassandra/FileFormatDesignDoc
 A basic interface for type specific compression could be as simple as:
 {code:java}
 public void compress(int version, final ListByteBuffer from, DataOutput to) 
 throws IOException
 public void decompress(int version, DataInput from, ListByteBuffer to) 
 throws IOException
 public void skip(int version, DataInput from) throws IOException
 {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3067) Simple SSTable Pluggability

2011-12-15 Thread Jonathan Ellis (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3067:
--

Fix Version/s: (was: 1.1)

 Simple SSTable Pluggability
 ---

 Key: CASSANDRA-3067
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3067
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
 Attachments: 
 0001-CASSANDRA-3067-Create-an-ABC-for-SSTableIdentityIterat.txt, 
 0002-CASSANDRA-3067-Move-from-linear-SSTable-versions-to-fe.txt, 
 0003-CASSANDRA-3067-Create-an-ABC-for-SSTableWriter.txt, 
 0004-CASSANDRA-3067-Rename-SSTable-Names-Slice-Iterator-to-.txt, 
 0005-CASSANDRA-3067-Create-ABCs-for-SSTableReader-and-KeyIt.txt, 
 0006-CASSANDRA-3067-Allow-overriding-the-current-sstable-ve.txt


 CASSANDRA-2995 proposes full storage engine pluggability, which is probably 
 unavoidable in the long run. For now though, I'd like to propose an 
 incremental alternative that preserves the sstable model, but allows it to 
 evolve non-linearly.
 The sstable version field could allow for simple switching between writable 
 sstable types, without moving all the way to differentiating between engines 
 as CASSANDRA-2995 requires. This can be accomplished by moving towards a 
 feature flags model (with a mapping between versions and feature sets), 
 rather than a linear versions model (where versions can be strictly ordered 
 and all versions above X have a feature).
 There are restrictions on this approach:
 * It's sufficient for an alternate SSTable(Writer|Reader|*) set to require a 
 patch to enable (rather than a JAR)
 * Filenames/descriptors/components must conform to the existing conventions

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3213) Upgrade Thrift

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170464#comment-13170464
 ] 

Jonathan Ellis commented on CASSANDRA-3213:
---

Where was the Thrift wire-compatibility change? Was that 0.6 - 0.7? If so 
maybe we should upgrade to 0.7 for our 1.1 release so that people can use a 
modern Thrift client-side.

 Upgrade Thrift
 --

 Key: CASSANDRA-3213
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3213
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jake Farrell
Assignee: Jake Farrell
Priority: Trivial
  Labels: thrift
 Fix For: 1.2

 Attachments: v1-0001-update-generated-thrift-code.patch, 
 v1-0002-upgrade-thrift-jar-and-license.patch, v1-0003-update-build-xml.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Marcus Eriksson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170467#comment-13170467
 ] 

Marcus Eriksson commented on CASSANDRA-2749:


sounds great (both just supporting new-style-layout and limiting names to 
32chars)

guess we need to supply a tool to rename sstables files if anyone is on longer 
names? and rolling upgrades are out of the question then right? (maybe the 
already are?)

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Pavel Yaskevich (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170475#comment-13170475
 ] 

Pavel Yaskevich commented on CASSANDRA-2749:


I think if we would be just supporting new style layout we can convert on 
startup. +1 on both ideas tho.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2056) Need a way of flattening schemas.

2011-12-15 Thread Gary Dusbabek (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170478#comment-13170478
 ] 

Gary Dusbabek commented on CASSANDRA-2056:
--

bq. This is obsolete post-CASSANDRA-1391

This ticket is orthogonal to 1391.  The purpose for flattening schemas is to 
make is so we do not have to maintain compatibility for the migration 
serializers indefinitely (for the purpose of bootstrapping a new node).



 Need a way of flattening schemas.
 -

 Key: CASSANDRA-2056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2056
 Project: Cassandra
  Issue Type: Improvement
Reporter: Gary Dusbabek
Priority: Minor
 Attachments: v2-0001-convert-MigrationManager-into-a-singleton.txt, 
 v2-0002-bail-on-migrations-originating-from-newer-protocol-ver.txt, 
 v2-0003-a-way-to-upgrade-schema-when-protocol-version-changes.txt


 For all of our trying not to, we still managed to screw this up.  Schema 
 updates currently contain a serialized RowMutation stored as a column value.  
 When a node needs updated schema, it requests these values, deserializes them 
 and applies them.  As the serialization scheme for RowMutation changes over 
 time (this is inevitable), those old migrations will become incompatible with 
 newer implementations of the RowMutation deserializer.  This means that when 
 new nodes come online, they'll get migration messages that they have trouble 
 deserializing.  (Remember, we've only made the promise that we'll be 
 backwards compatible for one version--see CASSANDRA-1015--even though we'd 
 eventually have this problem without that guarantee.)
 What I propose is a cluster command to flatten the schema prior to upgrading. 
  This would basically purge the old schema updates and replace them with a 
 single serialized migration (serialized in the current protocol version).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Sylvain Lebresne (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170480#comment-13170480
 ] 

Sylvain Lebresne commented on CASSANDRA-2749:
-

bq. guess we need to supply a tool to rename sstables files if anyone is on 
longer names?

We probably don't need to do anything. I don't think anyone is really using 
names long enough for them to it the file system limit, the goal of limiting 
the names is just so to prevent this from happening but there will be no other 
assumption that the names are short from the code.

I also don't think anything will prevent rolling upgrades, do you had something 
in mind?

Note: I have a long flight ahead of me so I plan to update my last patch with 
both those changes, as I still like the moving of all the directories handling 
in a dedicated class, even if we don't support both layout.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3616) Temp SSTable and file descriptor leak

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170481#comment-13170481
 ] 

Jonathan Ellis commented on CASSANDRA-3616:
---

+1

 Temp SSTable and file descriptor leak
 -

 Key: CASSANDRA-3616
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3616
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.5
 Environment: 1.0.5 + CASSANDRA-3532 patch
 Solaris 10
Reporter: Eric Parusel
 Attachments: 3616.patch


 Discussion about this started in CASSANDRA-3532.  It's on it's own ticket now.
 Anyhow:
 The nodes in my cluster are using a lot of file descriptors, holding open tmp 
 files. A few are using 50K+, nearing their limit (on Solaris, of 64K).
 Here's a small snippet of lsof:
 java 828 appdeployer *162u VREG 181,65540 0 333884 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776518-Data.db
 java 828 appdeployer *163u VREG 181,65540 0 333502 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776452-Data.db
 java 828 appdeployer *165u VREG 181,65540 0 333929 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776527-Index.db
 java 828 appdeployer *166u VREG 181,65540 0 333859 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776514-Data.db
 java 828 appdeployer *167u VREG 181,65540 0 333663 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776480-Data.db
 java 828 appdeployer *168u VREG 181,65540 0 333812 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 I spot checked a few and found they still exist on the filesystem too:
 rw-rr- 1 appdeployer appdeployer 0 Dec 12 07:16 
 /data1/cassandra/data/MA_DDR/messages_meta-tmp-hb-776506-Index.db
 After more investigation, it seems to happen during a CompactionTask.
 I waited until I saw some -tmp- files hanging around in the data dir:
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Data.db
 -rw-r--r--   1 appdeployer appdeployer   0 Dec 12 21:47:10 2011 
 messages_meta-tmp-hb-788904-Index.db
 and then found this in the logs:
  INFO [CompactionExecutor:18839] 2011-12-12 21:47:07,173 CompactionTask.java 
 (line 113) Compacting 
 [SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760408-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760413-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760409-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-788314-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760407-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760412-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760410-Data.db'),
  
 SSTableReader(path='/data1/cassandra/data/MA_DDR/messages_meta-hb-760411-Data.db')]
 INFO [CompactionExecutor:18839] 2011-12-12 21:47:10,461 CompactionTask.java 
 (line 218) Compacted to 
 [/data1/cassandra/data/MA_DDR/messages_meta-hb-788896-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788897-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788898-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788899-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788900-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788901-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788902-Data.db,/data1/cassandra/data/MA_DDR/messages_meta-hb-788903-Data.db,].
   83,899,295 to 83,891,657 (~99% of original) bytes for 75,662 keys at 
 24.332518MB/s.  Time: 3,288ms.
 Note that the timestamp of the 2nd log line matches the last modified time of 
 the files, and has IDs leading up to, *but not including 788904*.
 I thought this might be relavent information, but I haven't found the 
 specific cause yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1684) Entity groups

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170484#comment-13170484
 ] 

Jonathan Ellis commented on CASSANDRA-1684:
---

bq. if there were more optimizations done on rows (allowed them to be even 
larger, etc.), would that be a better approach? 

I think it would be.  That's definitely a long-term play, though.  I only have 
ideas on how to fix some of the problems Sylvain raised.  And then there's 
others like CASSANDRA-3362.

But we kind of need to fix large rows independent of the entity group idea.

bq. Two use cases where same row does not work for us:

Both of these sound like basically workarounds for weaknesses elsewhere.  Which 
again feels like the right answer is to fix those weaknesses rather than adding 
another layer of hack on top.

I guess there's really two questions here:
- Should we add a special row group api?
- What should the implementation look like?

In other words, we could add a row group api and implement it in terms of large 
rows.  Or implement it another way.  But, we want wide rows that work well 
independent of row groups, so it feels like that's the right place to spend our 
efforts now.

 Entity groups
 -

 Key: CASSANDRA-1684
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1684
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
 Fix For: 1.2

   Original Estimate: 80h
  Remaining Estimate: 80h

 Supporting entity groups similar to App Engine's (that is, allow rows to be 
 part of a parent entity group, whose key is used for routing instead of the 
 row itself) allows several improvements:
  - batches within an EG can be atomic across multiple rows
  - order-by-value queries within an EG only have to touch a single replica 
 even with RandomPartitioner

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3640) Dynamic Snitch does not compute scores if no direct reads hit the node.

2011-12-15 Thread Edward Capriolo (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170485#comment-13170485
 ] 

Edward Capriolo commented on CASSANDRA-3640:


I did happen to capture the snitch information from another node at the time of 
the event.

/10.71.71.51=3.15
/10.71.74.30=26.88
/10.71.71.62=1.67
/10.71.71.66=5.19
/10.71.71.73=3.76
/10.71.71.76=0.68
/10.71.74.34=1.66
/10.71.71.63=2.42
/10.71.71.72=0.82
/10.71.71.59=3.44
/10.71.74.33=1.21
/10.71.71.64=1.21
/10.71.71.60=2.19
/10.71.71.71=1.75
/10.71.74.32=106.55
/10.71.71.54=86.69
/10.71.71.53=5.14
/10.71.74.27=5.93
/10.71.74.31=3.11
/10.71.71.69=1.15
/10.71.71.56=2.73
/10.71.74.37=2.16
/10.71.71.70=2.85
/10.71.71.58=0.77
/10.71.71.55=5.83
/10.71.74.38=1.14
/10.71.74.35=3.61
/10.71.71.68=0.81
/10.71.71.67=0.69
/10.71.71.74=3.64
/10.71.71.57=1.21
/10.71.71.52=2.37
/10.71.71.65=6.78
/10.71.71.61=2.8
/10.71.71.75=4.12
/10.71.74.36=3.22

These are the two systems that I show as getting a lot of IO

/10.71.74.29=1.64
/10.71.74.28=3.49

We are doing mostly READ.ONE with a low read repair chance. ring is balanced 
and data/node is the same. badness_threshold is 0.2

No user load is hitting these two machines directly. Thus it is hard for me to 
understand how dynamic snitch is routing so much traffic to these machines that 
they are more burdened then other nodes. As I understand the dynamic snitch 
these machines should be the least burdened ones. 



 Dynamic Snitch does not compute scores if no direct reads hit the node.
 ---

 Key: CASSANDRA-3640
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3640
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.8.7
Reporter: Edward Capriolo
Priority: Minor

 We can into an interesting situation. We added 2 nodes to our cluster. 
 Strangely these nodes were performing worse then other nodes. They had more 
 IOwait for example. The impact was not major but it was noticeable. Later I 
 determined that these Cassandra node were not in our client's list of nodes 
 and our clients do not auto discover. I confirmed that the hosts did not have 
 any scores inside it's dynamic snitch.
 It is counter intuitive that a node receiving less or no direct user requests 
 would perform worse then others. I am not sure of the dynamic that caused 
 this. 
 I understand that DSnitch is supposed to have it's own view of the world, 
 maybe it could share information with neighbours. Again this is more of a 
 client configuration issue then a direct Cassandra issue, but I found it 
 interesting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2011-12-15 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170487#comment-13170487
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

It might be worth adding a are my filenames going to be too large check 
against all KS + CF combinations before starting to migrate data files around, 
though.  It would suck to end up with a partially converted database if some 
short CF names complete early on, before erroring out on a long one.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-add-new-directory-layout.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 2749.tar.gz, 2749_backwards_compatible_v1.patch, 
 2749_backwards_compatible_v2.patch, 2749_backwards_compatible_v3.patch, 
 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-3636) cassandra 1.0.6 debian packages will not run on OpenVZ

2011-12-15 Thread Sylvain Lebresne (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reassigned CASSANDRA-3636:
---

Assignee: Brandon Williams

 cassandra 1.0.6 debian packages will not run on OpenVZ
 --

 Key: CASSANDRA-3636
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3636
 Project: Cassandra
  Issue Type: Bug
  Components: Packaging
Affects Versions: 1.0.6
 Environment: Debian Linux (stable), OpenVZ container
Reporter: Zenek Kraweznik
Assignee: Brandon Williams
Priority: Critical

 During upgrade from 1.0.6
 {code}Setting up cassandra (1.0.6) ...
 *error: permission denied on key 'vm.max_map_count'*
 dpkg: error processing cassandra (--configure):
  subprocess installed post-installation script returned error exit status 255
 Errors were encountered while processing:
  cassandra
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3636) cassandra 1.0.6 debian packages will not run on OpenVZ

2011-12-15 Thread Jonathan Ellis (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3636:
--

Priority: Minor  (was: Critical)

 cassandra 1.0.6 debian packages will not run on OpenVZ
 --

 Key: CASSANDRA-3636
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3636
 Project: Cassandra
  Issue Type: Bug
  Components: Packaging
Affects Versions: 1.0.6
 Environment: Debian Linux (stable), OpenVZ container
Reporter: Zenek Kraweznik
Assignee: Brandon Williams
Priority: Minor

 During upgrade from 1.0.6
 {code}Setting up cassandra (1.0.6) ...
 *error: permission denied on key 'vm.max_map_count'*
 dpkg: error processing cassandra (--configure):
  subprocess installed post-installation script returned error exit status 255
 Errors were encountered while processing:
  cassandra
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3635) Throttle validation separately from other compaction

2011-12-15 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170490#comment-13170490
 ] 

Vijay commented on CASSANDRA-3635:
--

Nope, I think we can create a tree independent on the nodes and then compare it

Lets say we create a tree on A first after completion, we can create a tree on 
B and then on C (We have to sync on time, may be flush at the time when the 
repair was requested or something like that).

Once we have all the 3 Trees we can compare and transfer which as required. 
Once we have all the trees we can exchange the trees and then start real 
streaming if needed. That way we dont bring the whole range down or hot.

 Throttle validation separately from other compaction
 

 Key: CASSANDRA-3635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3635
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: repair
 Fix For: 0.8.10, 1.0.7

 Attachments: 0001-separate-validation-throttling.patch


 Validation compaction is fairly ressource intensive. It is possible to 
 throttle it with other compaction, but there is cases where you really want 
 to throttle it rather aggressively but don't necessarily want to have minor 
 compactions throttled that much. The goal is to (optionally) allow to set a 
 separate throttling value for validation.
 PS: I'm not pretending this will solve every repair problem or anything. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

2011-12-15 Thread Radim Kolar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170529#comment-13170529
 ] 

Radim Kolar commented on CASSANDRA-3497:


It will be good to have ability to shrink bloom filter during loading. Save 
only standard cassandra bloom filters but shrink them during load according to 
CF settings.

 BloomFilter FP ratio should be configurable or size-restricted some other way
 -

 Key: CASSANDRA-3497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Brandon Williams
Priority: Minor

 When you have a live dc and purely analytical dc, in many situations you can 
 have less nodes on the analytical side, but end up getting restricted by 
 having the BloomFilters in-memory, even though you have absolutely no use for 
 them.  It would be nice if you could reduce this memory requirement by tuning 
 the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

2011-12-15 Thread Jonathan Ellis (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3497:
--

Fix Version/s: 1.1
 Assignee: Yuki Morishita

 BloomFilter FP ratio should be configurable or size-restricted some other way
 -

 Key: CASSANDRA-3497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Brandon Williams
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.1


 When you have a live dc and purely analytical dc, in many situations you can 
 have less nodes on the analytical side, but end up getting restricted by 
 having the BloomFilters in-memory, even though you have absolutely no use for 
 them.  It would be nice if you could reduce this memory requirement by tuning 
 the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3626) Nodes can get stuck in UP state forever, despite being DOWN

2011-12-15 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170535#comment-13170535
 ] 

Hudson commented on CASSANDRA-3626:
---

Integrated in Cassandra-0.8 #419 (See 
[https://builds.apache.org/job/Cassandra-0.8/419/])
Prevent new nodes from thinking down nodes are up forever.
Patch by brandonwilliams, reviewed by Peter Schuller for CASSANDRA-3626

brandonwilliams : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1214916
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java


 Nodes can get stuck in UP state forever, despite being DOWN
 ---

 Key: CASSANDRA-3626
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3626
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.8, 1.0.5
Reporter: Peter Schuller
Assignee: Brandon Williams
 Fix For: 0.8.10, 1.0.7

 Attachments: 3626.txt


 This is a proposed phrasing for an upstream ticket named Newly discovered 
 nodes that are down get stuck in UP state forever (will edit w/ feedback 
 until done):
 We have a observed a problem with gossip which, when you are bootstrapping a 
 new node (or replacing using the replace_token support), any node in the 
 cluster which is Down at the time the node is started, will be assumed to be 
 Up and then *never ever* flapped back to Down until you restart the node.
 This has at least two implications to replacing or bootstrapping new nodes 
 when there are nodes down in the ring:
 * If the new node happens to select a node listed as (UP but in reality is 
 DOWN) as a stream source, streaming will sit there hanging forever.
 * If that doesn't happen (by picking another host), it will instead finish 
 bootstrapping correctly, and begin servicing requests all the while thinking 
 DOWN nodes are UP, and thus routing requests to them, generating timeouts.
 The way to get out of this is to restart the node(s) that you bootstrapped.
 I have tested and confirmed the symptom (that the bootstrapped node things 
 other nodes are Up) using a fairly recent 1.0. The main debugging effort 
 happened on 0.8 however, so all details below refer to 0.8 but are probably 
 similar in 1.0.
 Steps to reproduce:
 * Bring up a cluster of = 3 nodes. *Ensure RF is  N*, so that the cluster 
 is operative with one node removed.
 * Pick two random nodes A, and B. Shut them *both* off.
 * Wait for everyone to realize they are both off (for good measure).
 * Now, take node A and nuke it's data directories and re-start it, such that 
 it comes up w/ normal bootstrap (or use replace_token; didn't test that but 
 should not affect it).
 * Watch how node A starts up, all the while believing node B is down, even 
 though all other nodes in the cluster agree that B is down and B is in fact 
 still turned off.
 The mechanism by which it initially goes into Up state is that the node 
 receives a gossip response from any other node in the cluster, and 
 GossipDigestAck2VerbHandler.doVerb() calls Gossiper.applyStateLocally().
 Gossiper.applyStateLocally() doesn't have any local endpoint state for the 
 cluster, so the else statement at the end (it's a new node) gets triggered 
 and handleMajorStateChange() is called. handleMajorStateChange() always calls 
 markAlive(), unless the state is a dead state (but dead here does not mean 
 not up, but refers to joining/hibernate etc).
 So at this point the node is up in the mind of the node you just bootstrapped.
 Now, in each gossip round doStatusCheck() is called, which iterates over all 
 nodes (including the one falsly Up) and among other things, calls 
 FailureDetector.interpret() on each node.
 FailureDetector.interpret() is meant to update its sense of Phi for the node, 
 and potentially convict it. However there is a short-circuit at the top, 
 whereby if we do not yet have any arrival window for the node, we simply 
 return immediately.
 Arrival intervals are only added as a result of a FailureDetector.report() 
 call, which never happens in this case because the initial endpoint state we 
 added, which came from a remote node that was up, had the latest version of 
 the gossip state (so Gossiper.reportFailureDetector() will never call 
 report()).
 The result is that the node can never ever be convicted.
 Now, let's ignore for a moment the problem that a node that is actually Down 
 will be thought to be Up temporarily for a little while. That is sub-optimal, 
 but let's aim for a fix to the more serious problem in this ticket - which is 
 that is stays up forever.
 Considered solutions:
 * When interpret() gets called and there is no arrival window, we could add a 
 faked arrival window 

  1   2   >