[jira] [Updated] (CASSANDRA-2454) Possible deadlock for counter mutations
[ https://issues.apache.org/jira/browse/CASSANDRA-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-2454: Attachment: 0001-Submit-counters-update-on-mutation-stage-only-if-not.patch Looking closer that this there is 2 places from which we execute counter write: # if the coordinator is a replica, from the thrift thread. # otherwise in the CounterMutationVerbHandler on a replica, that is on the MUTATION stage. In the latter case, we must indeed avoid re-submitting to the MUTATION stage to avoid deadlock. But in the former we should not skip the stage. Attaching a v2 patch that distinguishes between the 2 cases and 'do the right thing'. Note that another way to fix this would be to make CounterMutationVerbHandler execute on some other stage that the MUTATION one. Even though that would simpler in the number of line modified, I don't think an existing stage would fit the bill and creating a new one for that doesn't feel right. Possible deadlock for counter mutations --- Key: CASSANDRA-2454 URL: https://issues.apache.org/jira/browse/CASSANDRA-2454 Project: Cassandra Issue Type: Bug Reporter: Stu Hood Fix For: 0.8 Attachments: 0001-Don-t-re-submit-to-the-mutation-stage.txt, 0001-Submit-counters-update-on-mutation-stage-only-if-not.patch {{StorageProxy.applyCounterMutation}} is executed on the mutation stage, but it also submits tasks to the mutation stage, and then blocks for them. If there are more than a few concurrent mutations, this can lead to deadlock. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2457) Batch_mutate is broken for counters
Batch_mutate is broken for counters --- Key: CASSANDRA-2457 URL: https://issues.apache.org/jira/browse/CASSANDRA-2457 Project: Cassandra Issue Type: Bug Affects Versions: 0.8 Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 0.8 CASSANDRA-2384 allowed for batch_mutate to take counter and non counter operation, but the code was not updated correctly to handle that case. As it is, the code will use the first mutation in the batch list to decide whether to apply the write code path of counter or not, and will thus break if those are mixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2458) cli divides read repair chance by 100
cli divides read repair chance by 100 - Key: CASSANDRA-2458 URL: https://issues.apache.org/jira/browse/CASSANDRA-2458 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Reporter: Aaron Morton Assignee: Aaron Morton Priority: Minor Fix For: 0.7.5, 0.8 cli incorrectly divides the read_repair chance by 100 when creating / updating CF's -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2458) cli divides read repair chance by 100
[ https://issues.apache.org/jira/browse/CASSANDRA-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Morton updated CASSANDRA-2458: Attachment: 0001-do-not-divide-read_repair_chance-by-100.patch now expects read repair chance to be between 0 and 1 for create and update CF. cli divides read repair chance by 100 - Key: CASSANDRA-2458 URL: https://issues.apache.org/jira/browse/CASSANDRA-2458 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Reporter: Aaron Morton Assignee: Aaron Morton Priority: Minor Labels: cli Fix For: 0.6.13, 0.7.5, 0.8 Attachments: 0001-do-not-divide-read_repair_chance-by-100.patch cli incorrectly divides the read_repair chance by 100 when creating / updating CF's -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Page peanutgyz deleted from Cassandra Wiki
Dear wiki user, You have subscribed to a wiki page Cassandra Wiki for change notification. The page peanutgyz has been deleted by gdusbabek. The comment on this change is: needless page.. http://wiki.apache.org/cassandra/peanutgyz
[jira] [Created] (CASSANDRA-2460) Make scrub deserialize rows
Make scrub deserialize rows --- Key: CASSANDRA-2460 URL: https://issues.apache.org/jira/browse/CASSANDRA-2460 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.4 Reporter: Sylvain Lebresne Right now, scrub don't deserialize the columns, and such there is a number of errors it could fix (or at least corrupted rows it could skip) but don't. This ticket proposes to handle those errors. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2452) Modify EC2 Snitch to use public ip and hence natively support for EC2 mult-region's.
[ https://issues.apache.org/jira/browse/CASSANDRA-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018821#comment-13018821 ] Jonathan Ellis commented on CASSANDRA-2452: --- Is this a one-size-fits-all solution? It seems to me some people would prefer to continue to use the private interface. Maybe this should be a new snitch class. Modify EC2 Snitch to use public ip and hence natively support for EC2 mult-region's. Key: CASSANDRA-2452 URL: https://issues.apache.org/jira/browse/CASSANDRA-2452 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8 Environment: JVM Reporter: Vijay Assignee: Vijay Priority: Minor Attachments: 2452-EC2Snitch-Changes.txt Make cassandra talk identify itself using the public ip (To avoid any future conflicts of private ips). 1) Split the logic of identification vs listen Address in the code. 2) Move the logic to assign IP address to the node into EndPointSnitch. 3) Make EC2 Snitch query for its public ip and use it for identification. 4) Make EC2 snitch to use InetAddress.getLocal to listen to the private ip. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2459) move SSTableScanner to use SSTableReader.dfile instead of BRAF
[ https://issues.apache.org/jira/browse/CASSANDRA-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018824#comment-13018824 ] Jonathan Ellis commented on CASSANDRA-2459: --- (a) we still need to avoid pushing hot rows out w/ our sequential scan; it seems to me that using the same source as the read stage makes this harder, not easier. move SSTableScanner to use SSTableReader.dfile instead of BRAF -- Key: CASSANDRA-2459 URL: https://issues.apache.org/jira/browse/CASSANDRA-2459 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Fix For: 1.0 That can give us the following benefits: a). don't need to skip a cache because we will be operating on memory mappings b). better performance (not copying data between kernel and user buffers as effect gained from using memory mapped segments, avoiding time operating on the kernel mode (+ time for switching context and read-ahead pressure) which BRAF involves) c). less impact on the live-reads d). less garbage will be created e). less file descriptors opened -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2459) move SSTableScanner to use SSTableReader.dfile instead of BRAF
[ https://issues.apache.org/jira/browse/CASSANDRA-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018837#comment-13018837 ] Pavel Yaskevich commented on CASSANDRA-2459: Using different sources creates additional I/O pressure where page cache for the same file should be preserved for mmaping and normal I/O, it seems to me that making it a single source with parallel compaction mechanism from CASSANDRA-2191 will lower affect on the live reads during compaction (also CASSANDRA-1902 with migration enabled can guarantee that page cache will be preserved were needed). move SSTableScanner to use SSTableReader.dfile instead of BRAF -- Key: CASSANDRA-2459 URL: https://issues.apache.org/jira/browse/CASSANDRA-2459 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Fix For: 1.0 That can give us the following benefits: a). don't need to skip a cache because we will be operating on memory mappings b). better performance (not copying data between kernel and user buffers as effect gained from using memory mapped segments, avoiding time operating on the kernel mode (+ time for switching context and read-ahead pressure) which BRAF involves) c). less impact on the live-reads d). less garbage will be created e). less file descriptors opened -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of HowToPublishToMavenCentral by StephenConnolly
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The HowToPublishToMavenCentral page has been changed by StephenConnolly. http://wiki.apache.org/cassandra/HowToPublishToMavenCentral?action=diffrev1=9rev2=10 -- /profiles /settings }}} + Notes + 1. You must have either {{{gpg.passphrase}}} or {{{gpg.useagent}}} but not both. + 1. The GPG keyname is the {{{sec}}} hex code for the key, e.g. if you had +{{{ + $ gpg --list-secret-keys + /Users/stephenc/.gnupg/secring.gpg + -- + sec 1024D/B620D787 2009-10-22 + uid Stephen Connolly steph...@apache.org + ssb 4096g/A2590985 2009-10-22 + + $ + }}} + You would use {{{gpg.keynameB620D787/gpg.keyname}}} == Using repository.apache.org == Please read the [[http://www.apache.org/dev/publishing-maven-artifacts.html#common|Common Procedures]] for details of how to close, drop and release stanging repositories.
[jira] [Commented] (CASSANDRA-1851) Publish cassandra artifacts to the maven central repository
[ https://issues.apache.org/jira/browse/CASSANDRA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018839#comment-13018839 ] T Jake Luciani commented on CASSANDRA-1851: --- This worked for me +1 Eric, looks like you might have put the keyname in the passphrase field? Publish cassandra artifacts to the maven central repository --- Key: CASSANDRA-1851 URL: https://issues.apache.org/jira/browse/CASSANDRA-1851 Project: Cassandra Issue Type: Improvement Components: Packaging Affects Versions: 0.7.0 rc 2, 0.7.0 rc 3, 0.7.1, 0.8 Reporter: Stephen Connolly Priority: Minor Fix For: 0.7.5 Attachments: MVN-PUBLISH-v2.patch, MVN-PUBLISH.patch Original Estimate: 0h Remaining Estimate: 0h See http://markmail.org/search/?q=list:org.apache.incubator.cassandra-dev#query:list%3Aorg.apache.incubator.cassandra-dev+page:1+mid:bmcd3ir33p3psqze+state:results I will be attaching a patch to this issue once I have got the ANT build to a state where it can push the cassandra artifacts to the maven central repository. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1851) Publish cassandra artifacts to the maven central repository
[ https://issues.apache.org/jira/browse/CASSANDRA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018840#comment-13018840 ] Eric Evans commented on CASSANDRA-1851: --- bq. Eric, looks like you might have put the keyname in the passphrase field? No, Stephen's latest wiki edit explains it; You can't have a passphrase set _and_ use the agent. Publish cassandra artifacts to the maven central repository --- Key: CASSANDRA-1851 URL: https://issues.apache.org/jira/browse/CASSANDRA-1851 Project: Cassandra Issue Type: Improvement Components: Packaging Affects Versions: 0.7.0 rc 2, 0.7.0 rc 3, 0.7.1, 0.8 Reporter: Stephen Connolly Priority: Minor Fix For: 0.7.5 Attachments: MVN-PUBLISH-v2.patch, MVN-PUBLISH.patch Original Estimate: 0h Remaining Estimate: 0h See http://markmail.org/search/?q=list:org.apache.incubator.cassandra-dev#query:list%3Aorg.apache.incubator.cassandra-dev+page:1+mid:bmcd3ir33p3psqze+state:results I will be attaching a patch to this issue once I have got the ANT build to a state where it can push the cassandra artifacts to the maven central repository. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of HowToPublishToMavenCentral by StephenConnolly
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The HowToPublishToMavenCentral page has been changed by StephenConnolly. The comment on this change is: adding details of password encryption. http://wiki.apache.org/cassandra/HowToPublishToMavenCentral?action=diffrev1=10rev2=11 -- $ }}} You would use {{{gpg.keynameB620D787/gpg.keyname}}} + 1. Once you have things working with your ldap password in plaintext, it is highly recommended that you use [[http://maven.apache.org/guides/mini/guide-encryption.html|Maven's encryption support]] to encrypt the password. == Using repository.apache.org == Please read the [[http://www.apache.org/dev/publishing-maven-artifacts.html#common|Common Procedures]] for details of how to close, drop and release stanging repositories.
[jira] [Commented] (CASSANDRA-1851) Publish cassandra artifacts to the maven central repository
[ https://issues.apache.org/jira/browse/CASSANDRA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018855#comment-13018855 ] Eric Evans commented on CASSANDRA-1851: --- Any news on when the updated MAT will land? Publish cassandra artifacts to the maven central repository --- Key: CASSANDRA-1851 URL: https://issues.apache.org/jira/browse/CASSANDRA-1851 Project: Cassandra Issue Type: Improvement Components: Packaging Affects Versions: 0.7.0 rc 2, 0.7.0 rc 3, 0.7.1, 0.8 Reporter: Stephen Connolly Priority: Minor Fix For: 0.7.5 Attachments: MVN-PUBLISH-v2.patch, MVN-PUBLISH.patch Original Estimate: 0h Remaining Estimate: 0h See http://markmail.org/search/?q=list:org.apache.incubator.cassandra-dev#query:list%3Aorg.apache.incubator.cassandra-dev+page:1+mid:bmcd3ir33p3psqze+state:results I will be attaching a patch to this issue once I have got the ANT build to a state where it can push the cassandra artifacts to the maven central repository. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1091445 - in /cassandra/branches/cassandra-0.8: build.xml conf/cassandra-env.sh lib/jamm-0.2.1.jar lib/jamm-0.2.jar lib/licenses/jamm-0.2.1.txt src/java/org/apache/cassandra/db/Memtable.j
Author: jbellis Date: Tue Apr 12 15:09:28 2011 New Revision: 1091445 URL: http://svn.apache.org/viewvc?rev=1091445view=rev Log: hack to allow OpenJDK to run w/o javaagent (otherwise it segfaults) patch by jbellis and brandonwilliams; reviewed by Pavel Yaskevich for CASSANDRA-2441 Added: cassandra/branches/cassandra-0.8/lib/jamm-0.2.1.jar (with props) cassandra/branches/cassandra-0.8/lib/licenses/jamm-0.2.1.txt Removed: cassandra/branches/cassandra-0.8/lib/jamm-0.2.jar Modified: cassandra/branches/cassandra-0.8/build.xml cassandra/branches/cassandra-0.8/conf/cassandra-env.sh cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/Memtable.java Modified: cassandra/branches/cassandra-0.8/build.xml URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/build.xml?rev=1091445r1=1091444r2=1091445view=diff == --- cassandra/branches/cassandra-0.8/build.xml (original) +++ cassandra/branches/cassandra-0.8/build.xml Tue Apr 12 15:09:28 2011 @@ -615,7 +615,7 @@ jvmarg value=-Dstorage-config=${test.conf}/ jvmarg value=-Daccess.properties=${test.conf}/access.properties/ jvmarg value=-Dlog4j.configuration=log4j-junit.properties / -jvmarg value=-javaagent:${basedir}/lib/jamm-0.2.jar / +jvmarg value=-javaagent:${basedir}/lib/jamm-0.2.1.jar / jvmarg value=-ea/ optjvmargs/ classpath Modified: cassandra/branches/cassandra-0.8/conf/cassandra-env.sh URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/conf/cassandra-env.sh?rev=1091445r1=1091444r2=1091445view=diff == --- cassandra/branches/cassandra-0.8/conf/cassandra-env.sh (original) +++ cassandra/branches/cassandra-0.8/conf/cassandra-env.sh Tue Apr 12 15:09:28 2011 @@ -92,7 +92,11 @@ JMX_PORT=7199 JVM_OPTS=$JVM_OPTS -ea # add the jamm javaagent -JVM_OPTS=$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.2.jar +java_version=`java -version 21` +if [[ $java_version != *OpenJDK* ]] +then +JVM_OPTS=$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.2.1.jar +fi # enable thread priorities, primarily so we can give periodic tasks # a lower priority to avoid interfering with client workload Added: cassandra/branches/cassandra-0.8/lib/jamm-0.2.1.jar URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/lib/jamm-0.2.1.jar?rev=1091445view=auto == Binary file - no diff available. Propchange: cassandra/branches/cassandra-0.8/lib/jamm-0.2.1.jar -- svn:mime-type = application/octet-stream Added: cassandra/branches/cassandra-0.8/lib/licenses/jamm-0.2.1.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/lib/licenses/jamm-0.2.1.txt?rev=1091445view=auto == --- cassandra/branches/cassandra-0.8/lib/licenses/jamm-0.2.1.txt (added) +++ cassandra/branches/cassandra-0.8/lib/licenses/jamm-0.2.1.txt Tue Apr 12 15:09:28 2011 @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 +http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + License shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + Licensor shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + Legal Entity shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + control means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + You (or Your) shall mean an individual or Legal Entity + exercising permissions granted by this License. + + Source form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + Object form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + Work shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by
svn commit: r1091447 - in /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db: ColumnFamilyStore.java RowIteratorFactory.java columniterator/IColumnIterator.java columniterator/SSTableS
Author: jbellis Date: Tue Apr 12 15:12:00 2011 New Revision: 1091447 URL: http://svn.apache.org/viewvc?rev=1091447view=rev Log: r/m unnecessary declaration of IOException from IColumnIterator.getColumnFamily patch by Stu Hood; reviewed by jbellis for CASSANDRA-2446 Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/RowIteratorFactory.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/columniterator/IColumnIterator.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/columniterator/SSTableSliceIterator.java cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/columniterator/SimpleSliceReader.java Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java?rev=1091447r1=1091446r2=1091447view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java Tue Apr 12 15:12:00 2011 @@ -1232,15 +1232,7 @@ public class ColumnFamilyStore implement } IColumnIterator ci = filter.getMemtableColumnIterator(cached, null, getComparator()); -ColumnFamily cf = null; -try -{ -cf = ci.getColumnFamily().cloneMeShallow(); -} -catch (IOException e) -{ -throw new IOError(e); -} +ColumnFamily cf = ci.getColumnFamily().cloneMeShallow(); filter.collectCollatedColumns(cf, ci, gcBefore); // TODO this is necessary because when we collate supercolumns together, we don't check // their subcolumns for relevance, so we need to do a second prune post facto here. @@ -1302,10 +1294,6 @@ public class ColumnFamilyStore implement // and there used to be data, but it's gone now (we should cache the empty CF so we don't need to rebuild that slower) return returnCF; } -catch (IOException e) -{ -throw new IOError(e); -} finally { /* close all cursors */ Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/RowIteratorFactory.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/RowIteratorFactory.java?rev=1091447r1=1091446r2=1091447view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/RowIteratorFactory.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/RowIteratorFactory.java Tue Apr 12 15:12:00 2011 @@ -118,14 +118,7 @@ public class RowIteratorFactory { this.colIters.add(current); this.key = current.getKey(); -try -{ -this.returnCF.delete(current.getColumnFamily()); -} -catch (IOException e) -{ -throw new IOError(e); -} +this.returnCF.delete(current.getColumnFamily()); } @Override Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/columniterator/IColumnIterator.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/columniterator/IColumnIterator.java?rev=1091447r1=1091446r2=1091447view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/columniterator/IColumnIterator.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/columniterator/IColumnIterator.java Tue Apr 12 15:12:00 2011 @@ -31,14 +31,9 @@ import org.apache.cassandra.db.IColumn; public interface IColumnIterator extends IteratorIColumn { /** - * returns the CF of the column being iterated. - * Do not modify the returned CF; clone first. - * This is guaranteed to be non-null and that the returned CF have the correct metadata - * (markedForDeleteAt and localDeletionTime). The full CF is however only guaranteed to - * be available after a call to next() or hasNext(). - * @throws IOException + * @return An empty CF holding metadata for the row being iterated. */ -public abstract ColumnFamily getColumnFamily() throws IOException; +public abstract ColumnFamily getColumnFamily(); /** * @return the current row key Modified:
[jira] [Resolved] (CASSANDRA-2456) using NTS, you get a an error (datacenter has no more endpoints) when there are no nodes in a DC
[ https://issues.apache.org/jira/browse/CASSANDRA-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2456. --- Resolution: Not A Problem It's an error to specify a nonzero replication factor for a DC that has zero nodes in it. Use DC3:0 if you want to perform writes before you actually have nodes in it. using NTS, you get a an error (datacenter has no more endpoints) when there are no nodes in a DC Key: CASSANDRA-2456 URL: https://issues.apache.org/jira/browse/CASSANDRA-2456 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Reporter: Josep M. Blanquer If one creates a brand new DC in the NTS (NetworkTopologyStrategy) for which no node has been seen, you'll get exceptions for any write consistency level after that. Note that you don't get the problem if you have a DC for which you have all nodes marked 'down'. It seems just a bug when there are 0 nodes seen in the DC. To reproduce: 1- Go to a running NTS cluster, and add a new DC in the strategy options. I.e., from the console, assuming you have DC1 and DC2 normally, add a DC3: update keyspace WhateverKeyspace with strategy_options=[{DC1:2,DC2:1,DC3:1}]; 2- Try to write...and you'll get: java.lang.IllegalStateException: datacenter (DC3) has no more endpoints, (1) replicas still needed but if you boot a node in DC3, and then stop it...the writes will succeed. I believe it should always succeed to be consistent? Otherwise one needs to boot nodes in the right DCs, get the snitches propagated and all...all before changing the NTS strategy options. Maybe that's fine...but it feels inconsistent with succeeding when a whole DC is down. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1091455 - in /cassandra/branches/cassandra-0.8: CHANGES.txt src/java/org/apache/cassandra/cli/CliClient.java
Author: jbellis Date: Tue Apr 12 15:17:52 2011 New Revision: 1091455 URL: http://svn.apache.org/viewvc?rev=1091455view=rev Log: cli no longer divides read_repair_chance by 100 patch by Aaron Morton; reviewed by jbellis for CASSANDRA-2458 Modified: cassandra/branches/cassandra-0.8/CHANGES.txt cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java Modified: cassandra/branches/cassandra-0.8/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1091455r1=1091454r2=1091455view=diff == --- cassandra/branches/cassandra-0.8/CHANGES.txt (original) +++ cassandra/branches/cassandra-0.8/CHANGES.txt Tue Apr 12 15:17:52 2011 @@ -22,6 +22,7 @@ * remove nodetool loadbalance (CASSANDRA-2448) * multithreaded compaction (CASSANDRA-2191) * compaction throttling (CASSANDRA-2156) + * cli no longer divides read_repair_chance by 100 (CASSANDRA-2458) 0.7.5 Modified: cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java?rev=1091455r1=1091454r2=1091455view=diff == --- cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java (original) +++ cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/cli/CliClient.java Tue Apr 12 15:17:52 2011 @@ -1039,10 +1039,10 @@ public class CliClient extends CliUserHe cfDef.setKey_cache_size(Double.parseDouble(mValue)); break; case READ_REPAIR_CHANCE: -double chance = Double.parseDouble(mValue) / 100; +double chance = Double.parseDouble(mValue); -if (chance 1) -throw new RuntimeException(Error: read_repair_chance / 100 should not be greater than 1.); +if (chance 0 || chance 1) +throw new RuntimeException(Error: read_repair_chance must be between 0 and 1.); cfDef.setRead_repair_chance(chance); break;
[jira] [Updated] (CASSANDRA-2458) cli divides read repair chance by 100
[ https://issues.apache.org/jira/browse/CASSANDRA-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2458: -- Affects Version/s: (was: 0.7.4) 0.6 Fix Version/s: (was: 0.7.5) (was: 0.6.13) cli divides read repair chance by 100 - Key: CASSANDRA-2458 URL: https://issues.apache.org/jira/browse/CASSANDRA-2458 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Aaron Morton Assignee: Aaron Morton Priority: Minor Labels: cli Fix For: 0.8 Attachments: 0001-do-not-divide-read_repair_chance-by-100.patch cli incorrectly divides the read_repair chance by 100 when creating / updating CF's -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2458) cli divides read repair chance by 100
[ https://issues.apache.org/jira/browse/CASSANDRA-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2458: -- Affects Version/s: (was: 0.6) 0.7.0 cli divides read repair chance by 100 - Key: CASSANDRA-2458 URL: https://issues.apache.org/jira/browse/CASSANDRA-2458 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Aaron Morton Assignee: Aaron Morton Priority: Minor Labels: cli Fix For: 0.8 Attachments: 0001-do-not-divide-read_repair_chance-by-100.patch cli incorrectly divides the read_repair chance by 100 when creating / updating CF's -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-2406) Secondary index and index expression problems
[ https://issues.apache.org/jira/browse/CASSANDRA-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reassigned CASSANDRA-2406: - Assignee: Pavel Yaskevich (was: Jonathan Ellis) Pavel, can you take a stab at figuring out why the ranges overlap? They are not supposed to. (I am using https://github.com/pcmanus/ccm for testing, it saves a lot of time.) Secondary index and index expression problems - Key: CASSANDRA-2406 URL: https://issues.apache.org/jira/browse/CASSANDRA-2406 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: CentOS 5.5 (64bit), JDK 1.6.0_23 Reporter: Muga Nishizawa Assignee: Pavel Yaskevich Fix For: 0.7.5 Attachments: CASSANDRA-2406-debug.patch, create_table.cli, node-1.system.log, secondary_index_checkv2.py, secondary_index_insertv2.py When I iteratively get data with secondary index and index clause, result of data acquired by consistency level one is different from the one by consistency level quorum. The one by consistecy level one is correct result. But the one by consistecy level quorum is incorrect and is dropped by Cassandra. You can reproduce the bug by executing attached programs. - 1. Start Cassandra cluster. It consists of 3 cassandra nodes and distributes data by ByteOrderedPartitioner. Initial tokens of those nodes are [31, 32, 33]. - 2. Create keyspace and column family, according to create_table.cli, - 3. Execute secondary_index_insertv2.py, inserting a few hundred columns to cluster - 4. Execute secondary_index_checkv2.py and get data with secondary index and index clause iteratively. secondary_index_insertv2.py and secondary_index_checkv2.py require pycassa. You will be able to execute 4th secondary_index_checkv2.py script with following option so that you get data with consistency level one. % python secondary_index_checkv2.py -one On the other hand, to acquire data with consistency level quorum, you will need to use following option. % python secondary_index_checkv2.py -quorum You can check that result of data acquired by consistency level one is different from one by consistency level quorum. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring
[ https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reassigned CASSANDRA-2405: - Assignee: Pavel Yaskevich Good idea, let's create an AntiEntropyServiceMBean (under o.a.c.db) and expose it there. should expose 'time since last successful repair' for easier aes monitoring --- Key: CASSANDRA-2405 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405 Project: Cassandra Issue Type: Improvement Reporter: Peter Schuller Assignee: Pavel Yaskevich Priority: Minor Fix For: 0.7.5 The practical implementation issues of actually ensuring repair runs is somewhat of an undocumented/untreated issue. One hopefully low hanging fruit would be to at least expose the time since last successful repair for a particular column family, to make it easier to write a correct script to monitor for lack of repair in a non-buggy fashion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2405) should expose 'time since last successful repair' for easier aes monitoring
[ https://issues.apache.org/jira/browse/CASSANDRA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2405: -- Fix Version/s: 0.7.5 should expose 'time since last successful repair' for easier aes monitoring --- Key: CASSANDRA-2405 URL: https://issues.apache.org/jira/browse/CASSANDRA-2405 Project: Cassandra Issue Type: Improvement Reporter: Peter Schuller Assignee: Pavel Yaskevich Priority: Minor Fix For: 0.7.5 The practical implementation issues of actually ensuring repair runs is somewhat of an undocumented/untreated issue. One hopefully low hanging fruit would be to at least expose the time since last successful repair for a particular column family, to make it easier to write a correct script to monitor for lack of repair in a non-buggy fashion. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.
[ https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2261: -- Affects Version/s: (was: 0.6) Fix Version/s: (was: 0.8) 1.0 During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted. --- Key: CASSANDRA-2261 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benjamin Coverston Assignee: Benjamin Coverston Priority: Minor Labels: not_a_pony Fix For: 1.0 Attachments: 2261.patch When a compaction of a set of SSTables fails because of corruption it will continue to try to compact that SSTable causing pending compactions to build up. One way to mitigate this problem would be to log the error, then identify the specific SSTable that caused the failure, subsequently blacklisting that SSTable and ensuring that it is no longer included in future compactions. For this we could simply store the problematic SSTable's name in memory. If it's not possible to identify the SSTable that caused the issue, then perhaps blacklisting the (ordered) permutation of SSTables to be compacted together is something that can be done to solve this problem in a more general case, and avoid issues where two (or more) SSTables have trouble compacting a particular row. For this option we would probably want to store the lists of the bad combinations in the system table somewhere s.t. these can survive a node failure (there have been a few cases where I have seen a compaction cause a node failure). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2460) Make scrub deserialize rows
[ https://issues.apache.org/jira/browse/CASSANDRA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2460: -- Affects Version/s: (was: 0.7.4) Fix Version/s: 0.7.5 Assignee: Sylvain Lebresne Make scrub deserialize rows --- Key: CASSANDRA-2460 URL: https://issues.apache.org/jira/browse/CASSANDRA-2460 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 0.7.5 Original Estimate: 4h Remaining Estimate: 4h Right now, scrub don't deserialize the columns, and such there is a number of errors it could fix (or at least corrupted rows it could skip) but don't. This ticket proposes to handle those errors. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2373) Index Out Of Bounds during Validation Compaction (Repair)
[ https://issues.apache.org/jira/browse/CASSANDRA-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-2373. --- Resolution: Duplicate Fix Version/s: (was: 0.7.5) Assignee: (was: Sylvain Lebresne) CASSANDRA-2460 is open now to fully deserialize during scrub. Index Out Of Bounds during Validation Compaction (Repair) - Key: CASSANDRA-2373 URL: https://issues.apache.org/jira/browse/CASSANDRA-2373 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: CentOS on EC2 Reporter: Benjamin Coverston Stack Trace is below: ERROR [CompactionExecutor:1] 2011-03-23 19:11:39,488 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Unknown Source) at java.nio.HeapByteBuffer.getInt(Unknown Source) at org.apache.cassandra.db.DeletedColumn.getLocalDeletionTime(DeletedColumn.java:57) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:879) at org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:866) at org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:857) at org.apache.cassandra.io.PrecompactedRow.init(PrecompactedRow.java:94) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:147) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doValidationCompaction(CompactionManager.java:822) at org.apache.cassandra.db.CompactionManager.access$800(CompactionManager.java:56) at org.apache.cassandra.db.CompactionManager$6.call(CompactionManager.java:358) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2459) move SSTableScanner to use SSTableReader.dfile instead of BRAF
[ https://issues.apache.org/jira/browse/CASSANDRA-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-2459: --- Priority: Minor (was: Major) move SSTableScanner to use SSTableReader.dfile instead of BRAF -- Key: CASSANDRA-2459 URL: https://issues.apache.org/jira/browse/CASSANDRA-2459 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Pavel Yaskevich Assignee: Pavel Yaskevich Priority: Minor Fix For: 1.0 That can give us the following benefits: a). don't need to skip a cache because we will be operating on memory mappings b). better performance (not copying data between kernel and user buffers as effect gained from using memory mapped segments, avoiding time operating on the kernel mode (+ time for switching context and read-ahead pressure) which BRAF involves) c). less impact on the live-reads d). less garbage will be created e). less file descriptors opened -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2326) stress.java indexed range slicing is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018878#comment-13018878 ] Jonathan Ellis commented on CASSANDRA-2326: --- a --values [list of values] option sounds like a good solution to me. doesn't have to be specific to index queries, but it's most useful there obviously. stress.java indexed range slicing is broken --- Key: CASSANDRA-2326 URL: https://issues.apache.org/jira/browse/CASSANDRA-2326 Project: Cassandra Issue Type: Bug Components: Contrib Reporter: Brandon Williams Assignee: Pavel Yaskevich Priority: Trivial I probably broke it when I fixed the build that CASSANDRA-2312 broke. Now it compiles, but never works. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2326) stress.java indexed range slicing is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018880#comment-13018880 ] Pavel Yaskevich commented on CASSANDRA-2326: I agree, I was also thinking about flag which will use old values generator instead of random as I possible solution. stress.java indexed range slicing is broken --- Key: CASSANDRA-2326 URL: https://issues.apache.org/jira/browse/CASSANDRA-2326 Project: Cassandra Issue Type: Bug Components: Contrib Reporter: Brandon Williams Assignee: Pavel Yaskevich Priority: Trivial I probably broke it when I fixed the build that CASSANDRA-2312 broke. Now it compiles, but never works. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2326) stress.java indexed range slicing is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018884#comment-13018884 ] Brandon Williams commented on CASSANDRA-2326: - I like the idea of adding a flag for the old behavior, that Just Worked and didn't require more command line mess. I'd even be onboard with making it the default and having random as an option, since random hasn't bought us much, if anything, yet. stress.java indexed range slicing is broken --- Key: CASSANDRA-2326 URL: https://issues.apache.org/jira/browse/CASSANDRA-2326 Project: Cassandra Issue Type: Bug Components: Contrib Reporter: Brandon Williams Assignee: Pavel Yaskevich Priority: Trivial I probably broke it when I fixed the build that CASSANDRA-2312 broke. Now it compiles, but never works. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2326) stress.java indexed range slicing is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018886#comment-13018886 ] Pavel Yaskevich commented on CASSANDRA-2326: I agree with Brandon on this, what do you think Jonathan? stress.java indexed range slicing is broken --- Key: CASSANDRA-2326 URL: https://issues.apache.org/jira/browse/CASSANDRA-2326 Project: Cassandra Issue Type: Bug Components: Contrib Reporter: Brandon Williams Assignee: Pavel Yaskevich Priority: Trivial I probably broke it when I fixed the build that CASSANDRA-2312 broke. Now it compiles, but never works. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2326) stress.java indexed range slicing is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1301#comment-1301 ] Jonathan Ellis commented on CASSANDRA-2326: --- sgtm stress.java indexed range slicing is broken --- Key: CASSANDRA-2326 URL: https://issues.apache.org/jira/browse/CASSANDRA-2326 Project: Cassandra Issue Type: Bug Components: Contrib Reporter: Brandon Williams Assignee: Pavel Yaskevich Priority: Trivial I probably broke it when I fixed the build that CASSANDRA-2312 broke. Now it compiles, but never works. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2452) Modify EC2 Snitch to use public ip and hence natively support for EC2 mult-region's.
[ https://issues.apache.org/jira/browse/CASSANDRA-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-2452: - Attachment: 2452-Intro-EC2MultiRegionSnitch-V2.txt V2 with the recommended way... Thanks! Modify EC2 Snitch to use public ip and hence natively support for EC2 mult-region's. Key: CASSANDRA-2452 URL: https://issues.apache.org/jira/browse/CASSANDRA-2452 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.8 Environment: JVM Reporter: Vijay Assignee: Vijay Priority: Minor Attachments: 2452-EC2Snitch-Changes.txt, 2452-Intro-EC2MultiRegionSnitch-V2.txt Make cassandra talk identify itself using the public ip (To avoid any future conflicts of private ips). 1) Split the logic of identification vs listen Address in the code. 2) Move the logic to assign IP address to the node into EndPointSnitch. 3) Make EC2 Snitch query for its public ip and use it for identification. 4) Make EC2 snitch to use InetAddress.getLocal to listen to the private ip. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2429) Silence non-errors while reconnecting
[ https://issues.apache.org/jira/browse/CASSANDRA-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2429: -- Reviewer: j.casares Affects Version/s: (was: 0.7.0) Fix Version/s: 0.7.5 Assignee: Jonathan Ellis Silence non-errors while reconnecting - Key: CASSANDRA-2429 URL: https://issues.apache.org/jira/browse/CASSANDRA-2429 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.6 Reporter: Joaquin Casares Assignee: Jonathan Ellis Priority: Trivial Fix For: 0.7.5 Attachments: 2429.txt This is from 0.6.3: INFO [Timer-0] 2010-10-28 18:49:08,714 Gossiper.java (line 181) InetAddress /192.168.80.17 is now dead. INFO [WRITE-/192.168.80.17] 2010-10-28 18:49:48,719 OutboundTcpConnection.java (line 102) error writing to /192.168.80.17 INFO [GC inspection] 2010-10-28 18:50:18,425 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 3463 ms, 77366288 reclaimed leaving 1006804528 used; max is 1207828480 INFO [GC inspection] 2010-10-28 18:50:29,792 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 3465 ms, 71343920 reclaimed leaving 1013965608 used; max is 1207828480 INFO [GMFD:1] 2010-10-28 18:50:32,713 Gossiper.java (line 593) Node /192.168.80.17 has restarted, now UP again INFO [HINTED-HANDOFF-POOL:1] 2010-10-28 18:50:32,713 HintedHandOffManager.java (line 153) Started hinted handoff for endPoint /192.168.80.17 INFO [GMFD:1] 2010-10-28 18:50:32,713 StorageService.java (line 548) Node /192.168.80.17 state jump to normal This is from 0.7.1: INFO [ScheduledTasks:1] 2011-02-25 17:54:41,445 Gossiper.java (line 224) InetAddress /10.240.17.235 is now dead. INFO [GossipStage:1] 2011-02-25 17:54:41,449 Gossiper.java (line 605) InetAddress /10.240.17.235 is now UP INFO [GossipStage:1] 2011-02-25 17:55:05,570 Gossiper.java (line 619) Node /10.241.103.223 has restarted, now UP again INFO [HintedHandoff:1] 2011-02-25 17:55:20,581 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.240.17.235 INFO [HintedHandoff:1] 2011-02-25 17:55:20,583 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.240.17.235 INFO [HintedHandoff:1] 2011-02-25 17:55:20,583 HintedHandOffManager.java (line 266) Checking remote schema before delivering hints INFO [HintedHandoff:1] 2011-02-25 17:55:20,583 HintedHandOffManager.java (line 272) Sleeping 56493ms to stagger hint delivery INFO [HintedHandoff:1] 2011-02-25 17:56:17,077 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.253.202.128 INFO [HintedHandoff:1] 2011-02-25 17:56:17,077 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.253.202.128 INFO [HintedHandoff:1] 2011-02-25 17:56:17,078 HintedHandOffManager.java (line 266) Checking remote schema before delivering hints INFO [HintedHandoff:1] 2011-02-25 17:56:17,078 HintedHandOffManager.java (line 272) Sleeping 8680ms to stagger hint delivery INFO [HintedHandoff:1] 2011-02-25 17:56:25,758 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.241.103.223 INFO [HintedHandoff:1] 2011-02-25 17:56:25,759 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.241.103.223 INFO [GossipStage:1] 2011-02-25 17:58:30,021 Gossiper.java (line 619) Node /10.253.183.111 has restarted, now UP again INFO [HintedHandoff:1] 2011-02-25 17:58:30,022 HintedHandOffManager.java (line 266) Checking remote schema before delivering hints INFO [HintedHandoff:1] 2011-02-25 17:58:30,022 HintedHandOffManager.java (line 272) Sleeping 43058ms to stagger hint delivery INFO [HintedHandoff:1] 2011-02-25 17:59:13,080 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.253.183.111 INFO [HintedHandoff:1] 2011-02-25 17:59:13,081 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.253.183.111 INFO [GossipStage:1] 2011-02-25 18:00:27,827 Gossiper.java (line 619) Node /10.194.249.140 has restarted, now UP again INFO [HintedHandoff:1] 2011-02-25 18:00:27,827 HintedHandOffManager.java (line 266) Checking remote schema before delivering hints INFO [HintedHandoff:1] 2011-02-25 18:00:27,827 HintedHandOffManager.java (line 272) Sleeping 43035ms to stagger hint delivery INFO [HintedHandoff:1] 2011-02-25 18:01:10,863 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.194.249.140 INFO [HintedHandoff:1] 2011-02-25 18:01:10,863 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.194.249.140 They are all over most logs and are very apparent when
[jira] [Updated] (CASSANDRA-2429) Silence non-errors while reconnecting
[ https://issues.apache.org/jira/browse/CASSANDRA-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2429: -- Attachment: 2429.txt patch to move the socket error and a couple of the HH lines to debug level Silence non-errors while reconnecting - Key: CASSANDRA-2429 URL: https://issues.apache.org/jira/browse/CASSANDRA-2429 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.6 Reporter: Joaquin Casares Priority: Trivial Fix For: 0.7.5 Attachments: 2429.txt This is from 0.6.3: INFO [Timer-0] 2010-10-28 18:49:08,714 Gossiper.java (line 181) InetAddress /192.168.80.17 is now dead. INFO [WRITE-/192.168.80.17] 2010-10-28 18:49:48,719 OutboundTcpConnection.java (line 102) error writing to /192.168.80.17 INFO [GC inspection] 2010-10-28 18:50:18,425 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 3463 ms, 77366288 reclaimed leaving 1006804528 used; max is 1207828480 INFO [GC inspection] 2010-10-28 18:50:29,792 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 3465 ms, 71343920 reclaimed leaving 1013965608 used; max is 1207828480 INFO [GMFD:1] 2010-10-28 18:50:32,713 Gossiper.java (line 593) Node /192.168.80.17 has restarted, now UP again INFO [HINTED-HANDOFF-POOL:1] 2010-10-28 18:50:32,713 HintedHandOffManager.java (line 153) Started hinted handoff for endPoint /192.168.80.17 INFO [GMFD:1] 2010-10-28 18:50:32,713 StorageService.java (line 548) Node /192.168.80.17 state jump to normal This is from 0.7.1: INFO [ScheduledTasks:1] 2011-02-25 17:54:41,445 Gossiper.java (line 224) InetAddress /10.240.17.235 is now dead. INFO [GossipStage:1] 2011-02-25 17:54:41,449 Gossiper.java (line 605) InetAddress /10.240.17.235 is now UP INFO [GossipStage:1] 2011-02-25 17:55:05,570 Gossiper.java (line 619) Node /10.241.103.223 has restarted, now UP again INFO [HintedHandoff:1] 2011-02-25 17:55:20,581 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.240.17.235 INFO [HintedHandoff:1] 2011-02-25 17:55:20,583 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.240.17.235 INFO [HintedHandoff:1] 2011-02-25 17:55:20,583 HintedHandOffManager.java (line 266) Checking remote schema before delivering hints INFO [HintedHandoff:1] 2011-02-25 17:55:20,583 HintedHandOffManager.java (line 272) Sleeping 56493ms to stagger hint delivery INFO [HintedHandoff:1] 2011-02-25 17:56:17,077 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.253.202.128 INFO [HintedHandoff:1] 2011-02-25 17:56:17,077 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.253.202.128 INFO [HintedHandoff:1] 2011-02-25 17:56:17,078 HintedHandOffManager.java (line 266) Checking remote schema before delivering hints INFO [HintedHandoff:1] 2011-02-25 17:56:17,078 HintedHandOffManager.java (line 272) Sleeping 8680ms to stagger hint delivery INFO [HintedHandoff:1] 2011-02-25 17:56:25,758 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.241.103.223 INFO [HintedHandoff:1] 2011-02-25 17:56:25,759 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.241.103.223 INFO [GossipStage:1] 2011-02-25 17:58:30,021 Gossiper.java (line 619) Node /10.253.183.111 has restarted, now UP again INFO [HintedHandoff:1] 2011-02-25 17:58:30,022 HintedHandOffManager.java (line 266) Checking remote schema before delivering hints INFO [HintedHandoff:1] 2011-02-25 17:58:30,022 HintedHandOffManager.java (line 272) Sleeping 43058ms to stagger hint delivery INFO [HintedHandoff:1] 2011-02-25 17:59:13,080 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.253.183.111 INFO [HintedHandoff:1] 2011-02-25 17:59:13,081 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.253.183.111 INFO [GossipStage:1] 2011-02-25 18:00:27,827 Gossiper.java (line 619) Node /10.194.249.140 has restarted, now UP again INFO [HintedHandoff:1] 2011-02-25 18:00:27,827 HintedHandOffManager.java (line 266) Checking remote schema before delivering hints INFO [HintedHandoff:1] 2011-02-25 18:00:27,827 HintedHandOffManager.java (line 272) Sleeping 43035ms to stagger hint delivery INFO [HintedHandoff:1] 2011-02-25 18:01:10,863 HintedHandOffManager.java (line 286) Started hinted handoff for endpoint /10.194.249.140 INFO [HintedHandoff:1] 2011-02-25 18:01:10,863 HintedHandOffManager.java (line 342) Finished hinted handoff of 0 rows to endpoint /10.194.249.140 They are all over most logs and are very apparent when searching for error. -- This message is automatically generated by JIRA. For
[jira] [Updated] (CASSANDRA-2452) New EC2 Snitch to use public ip and hence natively support for EC2 mult-region's.
[ https://issues.apache.org/jira/browse/CASSANDRA-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2452: -- Reviewer: brandon.williams Fix Version/s: 0.8 Issue Type: New Feature (was: Improvement) Summary: New EC2 Snitch to use public ip and hence natively support for EC2 mult-region's. (was: Modify EC2 Snitch to use public ip and hence natively support for EC2 mult-region's.) New EC2 Snitch to use public ip and hence natively support for EC2 mult-region's. - Key: CASSANDRA-2452 URL: https://issues.apache.org/jira/browse/CASSANDRA-2452 Project: Cassandra Issue Type: New Feature Components: Core Affects Versions: 0.8 Environment: JVM Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 0.8 Attachments: 2452-EC2Snitch-Changes.txt, 2452-Intro-EC2MultiRegionSnitch-V2.txt Make cassandra talk identify itself using the public ip (To avoid any future conflicts of private ips). 1) Split the logic of identification vs listen Address in the code. 2) Move the logic to assign IP address to the node into EndPointSnitch. 3) Make EC2 Snitch query for its public ip and use it for identification. 4) Make EC2 snitch to use InetAddress.getLocal to listen to the private ip. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2446) Remove unnecessary IOException from IColumnIterator.getColumnFamily
[ https://issues.apache.org/jira/browse/CASSANDRA-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018901#comment-13018901 ] Hudson commented on CASSANDRA-2446: --- Integrated in Cassandra-0.8 #2 (See [https://hudson.apache.org/hudson/job/Cassandra-0.8/2/]) r/m unnecessary declaration of IOException from IColumnIterator.getColumnFamily patch by Stu Hood; reviewed by jbellis for CASSANDRA-2446 Remove unnecessary IOException from IColumnIterator.getColumnFamily --- Key: CASSANDRA-2446 URL: https://issues.apache.org/jira/browse/CASSANDRA-2446 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Stu Hood Assignee: Stu Hood Priority: Trivial Fix For: 0.8 Attachments: 0001-CASSANDRA-2446-Remove-IOException-from-IColumnIterator.txt IColumnIterator.getColumnFamily throws IOException unnecessarily. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2458) cli divides read repair chance by 100
[ https://issues.apache.org/jira/browse/CASSANDRA-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018902#comment-13018902 ] Hudson commented on CASSANDRA-2458: --- Integrated in Cassandra-0.8 #2 (See [https://hudson.apache.org/hudson/job/Cassandra-0.8/2/]) cli no longer divides read_repair_chance by 100 patch by Aaron Morton; reviewed by jbellis for CASSANDRA-2458 cli divides read repair chance by 100 - Key: CASSANDRA-2458 URL: https://issues.apache.org/jira/browse/CASSANDRA-2458 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.0 Reporter: Aaron Morton Assignee: Aaron Morton Priority: Minor Labels: cli Fix For: 0.8 Attachments: 0001-do-not-divide-read_repair_chance-by-100.patch cli incorrectly divides the read_repair chance by 100 when creating / updating CF's -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2238) Allow nodetool to print out hostnames given an option
[ https://issues.apache.org/jira/browse/CASSANDRA-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2238: -- Affects Version/s: (was: 0.7.2) Allow nodetool to print out hostnames given an option - Key: CASSANDRA-2238 URL: https://issues.apache.org/jira/browse/CASSANDRA-2238 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Joaquin Casares Assignee: paul cannon Priority: Trivial Fix For: 0.8 Give nodetool the option of either displaying IPs or hostnames for the nodes in a ring. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-2238) Allow nodetool to print out hostnames given an option
[ https://issues.apache.org/jira/browse/CASSANDRA-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reassigned CASSANDRA-2238: - Assignee: paul cannon (was: Joaquin Casares) I'm okay with marking this wontfix but it seems like it should be a pretty easy feature to add. Allow nodetool to print out hostnames given an option - Key: CASSANDRA-2238 URL: https://issues.apache.org/jira/browse/CASSANDRA-2238 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.7.2 Reporter: Joaquin Casares Assignee: paul cannon Priority: Trivial Fix For: 0.8 Give nodetool the option of either displaying IPs or hostnames for the nodes in a ring. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-2061) Missing logging for some exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reassigned CASSANDRA-2061: - Assignee: paul cannon (was: Jonathan Ellis) Missing logging for some exceptions --- Key: CASSANDRA-2061 URL: https://issues.apache.org/jira/browse/CASSANDRA-2061 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Assignee: paul cannon Priority: Minor Fix For: 0.7.5 Attachments: 2061-0.7.txt, 2061.txt Original Estimate: 8h Remaining Estimate: 8h {quote}Since you are using ScheduledThreadPoolExecutor.schedule(), the exception was swallowed by the FutureTask. You will have to perform a get() method on the ScheduledFuture, and you will get ExecutionException if there was any exception occured in run().{quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2394) Faulty hd kills cluster performance
[ https://issues.apache.org/jira/browse/CASSANDRA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018906#comment-13018906 ] Brandon Williams commented on CASSANDRA-2394: - Thibaut, Can you provide more information on how long the degradation occurs and how many nodes are coordinators? Faulty hd kills cluster performance --- Key: CASSANDRA-2394 URL: https://issues.apache.org/jira/browse/CASSANDRA-2394 Project: Cassandra Issue Type: Bug Affects Versions: 0.7.4 Reporter: Thibaut Priority: Minor Fix For: 0.7.5 Hi, About every week, a node from our main cluster (100 nodes) has a faulty hd (Listing the cassandra data storage directoy triggers an input/output error). Whenever this occurs, I see many timeoutexceptions in our application on various nodes which cause everything to run very very slowly. Keyrange scans just timeout and will sometimes never succeed. If I stop cassandra on the faulty node, everything runs normal again. It would be great to have some kind of monitoring thread in cassandra which marks a node as down if there are multiple read/write errors to the data directories. A single faulty hd on 1 node shouldn't affect global cluster performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1418) Automatic, online load balancing
[ https://issues.apache.org/jira/browse/CASSANDRA-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-1418: -- Fix Version/s: (was: 1.0) Labels: ponies (was: ) Automatic, online load balancing Key: CASSANDRA-1418 URL: https://issues.apache.org/jira/browse/CASSANDRA-1418 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Labels: ponies h2. Goal CASSANDRA-192 began with the intention of implementing full cluster load balancing, but ended up being (wisely) limited to a manual load balancing operation. This issue is an umbrella ticket for finishing the job of implementing automatic, always-on load balancing. It is possible to implement very efficient load balancing operations with a single process directing the rebalancing of all nodes, but avoiding such a central process and allowing individual nodes to make their own movement decisions would be ideal. h2. Components h3. Optimal movements for individual nodes h4. Ruhl One such approach is the Ruhl algorithm described on 192: https://issues.apache.org/jira/browse/CASSANDRA-192#action_12713079 . But as described, it performs excessive movement for large hotspots, and can take a long time to reach equilibrium. Consider the following ring: ||token||load|| |a|5| |c|5| |e|5| |f|40| |k|5| Assuming that node 'a' is the first to discover that 'f' is overloaded: it will apply Case 2, and assume half of 'f's load by moving to 'i', leaving both with 20 units. But this is not a optimal movement, because both 'f' and 'a/i' will still be holding data that they will need to give away. Additionally, 'a/i' can't begin giving the data away until it has finished receiving it. If node 'e' is the first to discover that 'f' is overloaded, it will apply Case 1, and 'f' will give half of its load to 'e' by moving to 'i'. Again, this is a non-optimal movement, because it will result in both 'e' and 'f/i' holding data that they need to give away. h4. Adding load awareness to Ruhl Luckily, there appears to be a simple adjustment to the Ruhl algorithm that solves this problem by taking advantage of the fact that Cassandra knows the total load of a cluster, and can use it to calculate the average/ideal load ω. Once node j has decided it should take load from node i (based on the ε value in Ruhl), rather than node j taking 1/2 of the load on node i, it should chose a token such that either i or j ends up with a load within ε*ω of ω. Again considering the ring described above, and assuming ε == 1.0, the total load for the 5 nodes is 60, giving a ω of 12. If node 'a' is the first to discover 'f', it will choose to move to 'j' (a token that takes 12 or ω load units from 'f'), leaving 'f' with a load of 28. When combined with the improvement in the next section, this is closer to being an optimal movement, because 'a/j' will at worst have ε*ω of load to give away, and 'f' is in a position to start more movements. h3. Automatic load balancing Since the Ruhl algorithm only requires a node to make a decision based on itself and one other node, it should be relatively straightforward to add a timer on each node that periodically wakes up and executes the modifiied Ruhl algorithm if it is not already in the process of moving (based on pending ranges). Automatic balancing should probably be enabled by default, and should have a configurable per-node bandwidth cap. h3. Allowing concurrent moves on a node Allowing a node to give away multiple ranges at once allows for the type of quick balancing that is typically only attributed to vnodes. If a node is a hotspot, such as in the example above, the node should be able to quickly dump the load in a manner that causes minimal load on the rest of the cluster. Rather than transferring to 1 target at 10 MB/s, a hotspot can give to 5 targets at 2 MB/s each. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1988) Prefer to throw Unavailable rather than Timeout
[ https://issues.apache.org/jira/browse/CASSANDRA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018917#comment-13018917 ] Jonathan Ellis commented on CASSANDRA-1988: --- If we deserialized the messages in OutboundTcpConnection.closeSocket and poked some kind of unable to send message status into the callback, then the callback could use that to throw UnavailableException. (Deserializing is probably the easiest way to avoid making a bunch of fairly hairy changes to the MS/OTC flow. Performance is a non-issue since we only do it when a node goes down.) We'd also need to introduce a different exception, since UE signals I knew I couldn't satisfy the request so I didn't start it which is useful to distinguish from some of the replicas may have the write performed but not enough. Finally, you might still timeout before the FailureDetector signals that the node died, so you still have to deal with the original behavior. Feels like a lot of complexity for a minor corner case. Prefer to throw Unavailable rather than Timeout --- Key: CASSANDRA-1988 URL: https://issues.apache.org/jira/browse/CASSANDRA-1988 Project: Cassandra Issue Type: Improvement Components: API Reporter: Stu Hood Fix For: 1.0 When a node is unreachable, but is not yet being reported dead by gossip, messages are enqueued in the messaging service to be sent when the node becomes available again (on the assumption that the connection dropped temporarily). Higher up in the client layer, before sending messages to other nodes, we check that they are alive according to gossip, and fail fast with UnavailableException if they are not (CASSANDRA-1803). If we send messages to nodes that are not yet being reported dead, the messages sit in queue, and time out rather than being sent: this results in the client request failing with a TimeoutException. If we differentiate between messages that were never sent (aka, are still queued in the MessagingService at the end of the timeout), and messages that were sent but didn't get a response, we can properly throw UnavailableException in the former case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1091508 - in /cassandra/branches/cassandra-0.8: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/db/
Author: slebresne Date: Tue Apr 12 17:29:08 2011 New Revision: 1091508 URL: http://svn.apache.org/viewvc?rev=1091508view=rev Log: Merging CASSANDRA-2451 from 0.7 Modified: cassandra/branches/cassandra-0.8/ (props changed) cassandra/branches/cassandra-0.8/contrib/ (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/CompactionManager.java Propchange: cassandra/branches/cassandra-0.8/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:29:08 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7:1026516-1091087 +/cassandra/branches/cassandra-0.7:1026516-1091087,1091503 /cassandra/branches/cassandra-0.7.0:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 /cassandra/trunk:1090978-1090979 Propchange: cassandra/branches/cassandra-0.8/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:29:08 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009 -/cassandra/branches/cassandra-0.7/contrib:1026516-1091087 +/cassandra/branches/cassandra-0.7/contrib:1026516-1091087,1091503 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689 /cassandra/trunk/contrib:1090978-1090979 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:29:08 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1091087 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1091087,1091503 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689 /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090978-1090979 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:29:08 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1091087 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1091087,1091503 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1051699-1053689 /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1090978-1090979 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:29:08 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java:1026516-1091087
[jira] [Assigned] (CASSANDRA-2427) Heuristic or hard cap to prevent fragmented commit logs from bringing down the server
[ https://issues.apache.org/jira/browse/CASSANDRA-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reassigned CASSANDRA-2427: - Assignee: paul cannon How about this: - create a commitlog-space-cap setting; if space gets above that, flush every dirty CF in the oldest segment and remove it - remove the per-CF memtable_flush_after_minutes setting in favor of the above (this fixes the flush-storm problem of all your CFs flushing at once when you configure this too low) - while we're at it, remove the commitlog segment size config setting, nobody has ever changed it to my knowledge in 2+ years (I don't think we want to add an additional check for is there enough space for a new segment at segment creation time; if there isn't enough space for that there probably isn't enough to flush either and we're screwed either way. So it would be complexity w/o a purpose.) Heuristic or hard cap to prevent fragmented commit logs from bringing down the server - Key: CASSANDRA-2427 URL: https://issues.apache.org/jira/browse/CASSANDRA-2427 Project: Cassandra Issue Type: Improvement Reporter: Benjamin Coverston Assignee: paul cannon Labels: commitlog, hardening Fix For: 1.0 Widely divergent write rates on column families can cause the commit log segments to fragment. In some cases we have seen the commit log partition overrun. One solution here would be to create a heuristic for segment fragmentation to trigger a flush (commit log segments/memtable) or simply track the free disk space and force a global flush when the disk gets to 80% capacity. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1091513 - in /cassandra/trunk: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/db/
Author: slebresne Date: Tue Apr 12 17:35:34 2011 New Revision: 1091513 URL: http://svn.apache.org/viewvc?rev=1091513view=rev Log: Merge CASSANDRA-2451 from 0.8 Modified: cassandra/trunk/ (props changed) cassandra/trunk/contrib/ (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/trunk/src/java/org/apache/cassandra/db/CompactionManager.java Propchange: cassandra/trunk/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:35:34 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7:1026516-1090647 +/cassandra/branches/cassandra-0.7:1026516-1090647,1091503 /cassandra/branches/cassandra-0.7.0:1053690-1055654 -/cassandra/branches/cassandra-0.8:1091148 +/cassandra/branches/cassandra-0.8:1091148,1091508 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3:774578-796573 /incubator/cassandra/branches/cassandra-0.4:810145-834239,834349-834350 Propchange: cassandra/trunk/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:35:34 2011 @@ -1,9 +1,9 @@ /cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009 -/cassandra/branches/cassandra-0.7/contrib:1026516-1090647 +/cassandra/branches/cassandra-0.7/contrib:1026516-1090647,1091503 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654 -/cassandra/branches/cassandra-0.8/contrib:1091148 +/cassandra/branches/cassandra-0.8/contrib:1091148,1091508 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3/contrib:774578-796573 -/incubator/cassandra/branches/cassandra-0.4/contrib:810145-834239,834349-834350 +/incubator/cassandra/branches/cassandra-0.4/contrib:810145-810987,810994-834239,834349-834350 /incubator/cassandra/branches/cassandra-0.5/contrib:72-915439 /incubator/cassandra/branches/cassandra-0.6/contrib:911237-922688 Propchange: cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:35:34 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1090647 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1090647,1091503 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 -/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1091148 +/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1091148,1091508 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3/interface/gen-java/org/apache/cassandra/service/Cassandra.java:774578-796573 /incubator/cassandra/branches/cassandra-0.4/interface/gen-java/org/apache/cassandra/service/Cassandra.java:810145-834239,834349-834350 Propchange: cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 17:35:34 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1090647 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1090647,1091503 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1053690-1055654 -/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1091148
buildbot success in ASF Buildbot on cassandra-trunk
The Buildbot has detected a restored build on builder cassandra-trunk while building ASF Buildbot. Full details are available at: http://ci.apache.org/builders/cassandra-trunk/builds/1264 Buildbot URL: http://ci.apache.org/ Buildslave for this Build: isis_ubuntu Build Reason: scheduler Build Source Stamp: [branch cassandra/trunk] 1091513 Blamelist: slebresne Build succeeded! sincerely, -The Buildbot
[jira] [Commented] (CASSANDRA-2451) Make clean compactions cleanup the row cache
[ https://issues.apache.org/jira/browse/CASSANDRA-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018927#comment-13018927 ] Hudson commented on CASSANDRA-2451: --- Integrated in Cassandra-0.7 #431 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/431/]) Make clean compactions cleanup the row cache patch by slebresne; reviewed by jbellis for CASSANDRA-2451 Make clean compactions cleanup the row cache Key: CASSANDRA-2451 URL: https://issues.apache.org/jira/browse/CASSANDRA-2451 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.0 Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 0.7.5 Attachments: 0001-Cleanup-cache-during-cleanup-compaction.patch Original Estimate: 1h Remaining Estimate: 1h We uselessly keep in cache keys that are cleanup, which is not a big deal because they will get expunged eventually but there is no point in wasting the memory in the meantime. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up
[ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-2088: Attachment: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch I think there is a few different things here and I think we should separate them somehow. Fixing the fact that streaming leave tmp files around when it fails is a 2 lines fix and I think this is simple enough that it could go to 0.7. I'm attaching a patch against 0.7. It's extracted from Aaron first patch, although rebased on 0.7 (and fix a bug). Making repair aware that there has been some failures is actually more complicated so that should go in 0.8.1 or something (and should go to CASSANDRA-2433 or another ticket that describe the problem better). Temp files for failed compactions/streaming not cleaned up -- Key: CASSANDRA-2088 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Assignee: Aaron Morton Fix For: 0.8 Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up
[ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018942#comment-13018942 ] Jonathan Ellis commented on CASSANDRA-2088: --- bq. I'm attaching a patch against 0.7 Is that 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch? I don't see the connection to .tmp files. (Also: have you verified that the channel will actually infinite-loop returning 0? Kind of odd behavior, although I guess it's technically within-spec.) Temp files for failed compactions/streaming not cleaned up -- Key: CASSANDRA-2088 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Assignee: Aaron Morton Fix For: 0.8 Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-808) Need a way to skip corrupted data in SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-808. -- Resolution: Won't Fix Fix Version/s: (was: 1.0) nodetool scrub seems to address this adequately now. Need a way to skip corrupted data in SSTables - Key: CASSANDRA-808 URL: https://issues.apache.org/jira/browse/CASSANDRA-808 Project: Cassandra Issue Type: Improvement Reporter: Stu Hood Priority: Minor The new SSTable format will allow for checksumming of the data file, but as it stands, we don't have a better way to handle the situation than throwing an Exception indicating that the data is unreadable. We might want to add an option (triggerable via a command line flag?) to Cassandra that will allow for skipping of corrupted keys/blocks in SSTables, to pretend they don't exist rather than throwing the Exception. An administrator could temporarily enable the option and trigger a compaction to perform a local repair of data, or they could leave it enabled constantly for hands-off recovery. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-2369) support replication decisions per-key
[ https://issues.apache.org/jira/browse/CASSANDRA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis reassigned CASSANDRA-2369: - Assignee: paul cannon (was: Jonathan Ellis) I think the goal here is to end up with AbstractReplicationStrategy.getNaturalEndpoints take a DecoratedKey parameter instead of a Token. Ultimately, even assuming that we always start by decorating the key with a token might be assuming too much (and want to decide based on the raw ByteBuffer key) but let's not borrow trouble; it should be a lot easier to make the DecoratedKey change since existing code that wants a token can easily be satisfied that way. support replication decisions per-key - Key: CASSANDRA-2369 URL: https://issues.apache.org/jira/browse/CASSANDRA-2369 Project: Cassandra Issue Type: New Feature Reporter: Jonathan Ellis Assignee: paul cannon Priority: Minor Fix For: 1.0 Currently the replicationstrategy gets a token and a keyspace with which to decide how to place replicas. for per-row replication this is insufficient because tokenization is lossy (CASSANDRA-1034). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up
[ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018965#comment-13018965 ] Sylvain Lebresne commented on CASSANDRA-2088: - bq. Is that 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch? I don't see the connection to .tmp files. (Also: have you verified that the channel will actually infinite-loop returning 0? Kind of odd behavior, although I guess it's technically within-spec.) Yes. IncomingStreamReader does clean the tmp file when there is an expection (there's an enclosing 'try catch'). The problem is that no exception is raised if the other side of the connection dies. What will happen then is the read will infinitely read 0 bytes. So this actually avoid the infinite loop returning 0 (and so I think answered your second question, so it wasn't very clear). Note that without this patch, there is an infinite loop that will hold a socket open forever (and consume cpu, though very few probably in that case). So this is not just merely a fix of deleting the tmp files. But it does as a consequence of correctly raising an exception when should be. Temp files for failed compactions/streaming not cleaned up -- Key: CASSANDRA-2088 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Assignee: Aaron Morton Fix For: 0.8 Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2326) stress.java indexed range slicing is broken
[ https://issues.apache.org/jira/browse/CASSANDRA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Yaskevich updated CASSANDRA-2326: --- Attachment: CASSANDRA-2326-trunk.patch CASSANDRA-2326.patch -V to generate randomized average size values, old behaviour by default stress.java indexed range slicing is broken --- Key: CASSANDRA-2326 URL: https://issues.apache.org/jira/browse/CASSANDRA-2326 Project: Cassandra Issue Type: Bug Components: Contrib Reporter: Brandon Williams Assignee: Pavel Yaskevich Priority: Trivial Attachments: CASSANDRA-2326-trunk.patch, CASSANDRA-2326.patch I probably broke it when I fixed the build that CASSANDRA-2312 broke. Now it compiles, but never works. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up
[ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018972#comment-13018972 ] Jonathan Ellis commented on CASSANDRA-2088: --- +1, and can you move some of that explanation inline as a comment? Temp files for failed compactions/streaming not cleaned up -- Key: CASSANDRA-2088 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Assignee: Aaron Morton Fix For: 0.8 Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2451) Make clean compactions cleanup the row cache
[ https://issues.apache.org/jira/browse/CASSANDRA-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13018977#comment-13018977 ] Hudson commented on CASSANDRA-2451: --- Integrated in Cassandra-0.8 #3 (See [https://hudson.apache.org/hudson/job/Cassandra-0.8/3/]) Merging CASSANDRA-2451 from 0.7 Make clean compactions cleanup the row cache Key: CASSANDRA-2451 URL: https://issues.apache.org/jira/browse/CASSANDRA-2451 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.0 Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 0.7.5 Attachments: 0001-Cleanup-cache-during-cleanup-compaction.patch Original Estimate: 1h Remaining Estimate: 1h We uselessly keep in cache keys that are cleanup, which is not a big deal because they will get expunged eventually but there is no point in wasting the memory in the meantime. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1091542 - /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java
Author: slebresne Date: Tue Apr 12 18:59:43 2011 New Revision: 1091542 URL: http://svn.apache.org/viewvc?rev=1091542view=rev Log: Better detect failure during streaming (always cleaning up tmp files as a consequence) patch by amorton and slebresne; reviewed by jbellis for (the first part of) CASSANDRA-2088 Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java Modified: cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java?rev=1091542r1=1091541r2=1091542view=diff == --- cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java (original) +++ cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java Tue Apr 12 18:59:43 2011 @@ -84,6 +84,11 @@ public class IncomingStreamReader { long toRead = Math.min(FileStreamTask.CHUNK_SIZE, length - bytesRead); long lastRead = fc.transferFrom(socketChannel, offset + bytesRead, toRead); +// if the other side fails, we will not get an exception, but instead transferFrom will constantly return 0 byte read +// and we would thus enter an infinite loop. So intead, if no bytes are tranferred we assume the other side is dead and +// raise an exception (that will be catch belove and 'the right thing' will be done). +if (lastRead == 0) +throw new IOException(Transfer failed for remote file + remoteFile); bytesRead += lastRead; remoteFile.progress += lastRead; }
svn commit: r1091544 - in /cassandra/branches/cassandra-0.8: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/streaming/
Author: slebresne Date: Tue Apr 12 19:06:18 2011 New Revision: 1091544 URL: http://svn.apache.org/viewvc?rev=1091544view=rev Log: Merge 2088 (first part) from 0.7 Modified: cassandra/branches/cassandra-0.8/ (props changed) cassandra/branches/cassandra-0.8/contrib/ (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java Propchange: cassandra/branches/cassandra-0.8/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:06:18 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7:1026516-1091087,1091503 +/cassandra/branches/cassandra-0.7:1026516-1091087,1091503,1091542 /cassandra/branches/cassandra-0.7.0:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 /cassandra/trunk:1090978-1090979 Propchange: cassandra/branches/cassandra-0.8/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:06:18 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009 -/cassandra/branches/cassandra-0.7/contrib:1026516-1091087,1091503 +/cassandra/branches/cassandra-0.7/contrib:1026516-1091087,1091503,1091542 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689 /cassandra/trunk/contrib:1090978-1090979 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:06:18 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1091087,1091503 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1091087,1091503,1091542 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689 /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1090978-1090979 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:06:18 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1091087,1091503 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1091087,1091503,1091542 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1053690-1055654 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1051699-1053689 /cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1090978-1090979 Propchange: cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:06:18 2011 @@ -1,5 +1,5 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java:1026516-1091087,1091503
svn commit: r1091547 - in /cassandra/trunk: ./ contrib/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/java/org/apache/cassandra/streaming/
Author: slebresne Date: Tue Apr 12 19:08:52 2011 New Revision: 1091547 URL: http://svn.apache.org/viewvc?rev=1091547view=rev Log: Merge CASSANDRA-2088 (first part) from 0.8 Modified: cassandra/trunk/ (props changed) cassandra/trunk/contrib/ (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/trunk/src/java/org/apache/cassandra/streaming/IncomingStreamReader.java Propchange: cassandra/trunk/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:08:52 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7:1026516-1090647,1091503 +/cassandra/branches/cassandra-0.7:1026516-1090647,1091503,1091542 /cassandra/branches/cassandra-0.7.0:1053690-1055654 -/cassandra/branches/cassandra-0.8:1091148,1091508 +/cassandra/branches/cassandra-0.8:1091148,1091508,1091544 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3:774578-796573 /incubator/cassandra/branches/cassandra-0.4:810145-834239,834349-834350 Propchange: cassandra/trunk/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:08:52 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009 -/cassandra/branches/cassandra-0.7/contrib:1026516-1090647,1091503 +/cassandra/branches/cassandra-0.7/contrib:1026516-1090647,1091503,1091542 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654 -/cassandra/branches/cassandra-0.8/contrib:1091148,1091508 +/cassandra/branches/cassandra-0.8/contrib:1091148,1091508,1091544 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3/contrib:774578-796573 /incubator/cassandra/branches/cassandra-0.4/contrib:810145-810987,810994-834239,834349-834350 Propchange: cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:08:52 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1090647,1091503 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1026516-1090647,1091503,1091542 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1053690-1055654 -/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1091148,1091508 +/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1091148,1091508,1091544 /cassandra/tags/cassandra-0.7.0-rc3/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3/interface/gen-java/org/apache/cassandra/service/Cassandra.java:774578-796573 /incubator/cassandra/branches/cassandra-0.4/interface/gen-java/org/apache/cassandra/service/Cassandra.java:810145-834239,834349-834350 Propchange: cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 19:08:52 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1090647,1091503 +/cassandra/branches/cassandra-0.7/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1026516-1090647,1091503,1091542 /cassandra/branches/cassandra-0.7.0/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1053690-1055654 -/cassandra/branches/cassandra-0.8/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java:1091148,1091508
[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up
[ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019005#comment-13019005 ] Sylvain Lebresne commented on CASSANDRA-2088: - Committed that first part. I think we should keep that open to fix the tmp files for failed compaction and move the rest to another ticket (like CASSANDRA-2433 for instance). About the attached patch on cleaning up failed compaction: * We should also handle cleanup and scrub * We should handle SSTableWriter.Builder as it is yet another place where we could miss to cleanup a tmp file on error. * In theory a failed flush could leave a tmp file behind. If that happens having a tmp file would be the least of your problem but for completeness sake we could handle it. * The logging when failing to close iwriter and dataFile in SSTableWriter could probably go at error (we should not be failing there, if we do something is wrong) * That's nitpick but I'm not a huge fan of catching RuntimeException in this case as this pollute the code for something that would be a programming error (that's probably debatable though). Maybe another solution would be to have this in the final block. It means making sure closeAndDelete() is ok with the file being already closed and/or deleted and having this final block *after* the closeAndOpenReader call. Temp files for failed compactions/streaming not cleaned up -- Key: CASSANDRA-2088 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Assignee: Aaron Morton Fix For: 0.8 Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2462) Fix build for distributed and stress tests
Fix build for distributed and stress tests -- Key: CASSANDRA-2462 URL: https://issues.apache.org/jira/browse/CASSANDRA-2462 Project: Cassandra Issue Type: Bug Reporter: Stu Hood Assignee: Stu Hood Attachments: 0001-Update-stress-and-tests-for-trunk.txt Distributed and stress tests are not compiling for trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2462) Fix build for distributed and stress tests
[ https://issues.apache.org/jira/browse/CASSANDRA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-2462: Attachment: 0001-Update-stress-and-tests-for-trunk.txt Fix build for distributed and stress tests -- Key: CASSANDRA-2462 URL: https://issues.apache.org/jira/browse/CASSANDRA-2462 Project: Cassandra Issue Type: Bug Reporter: Stu Hood Assignee: Stu Hood Attachments: 0001-Update-stress-and-tests-for-trunk.txt Distributed and stress tests are not compiling for trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1091572 [3/3] - in /cassandra/trunk: ./ contrib/ drivers/py/cql/cassandra/ drivers/txpy/txcql/cassandra/ interface/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/avro/ src/ja
Modified: cassandra/trunk/drivers/txpy/txcql/cassandra/constants.py URL: http://svn.apache.org/viewvc/cassandra/trunk/drivers/txpy/txcql/cassandra/constants.py?rev=1091572r1=1091571r2=1091572view=diff == --- cassandra/trunk/drivers/txpy/txcql/cassandra/constants.py (original) +++ cassandra/trunk/drivers/txpy/txcql/cassandra/constants.py Tue Apr 12 21:08:08 2011 @@ -7,4 +7,4 @@ from thrift.Thrift import * from ttypes import * -VERSION = 20.0.0 +VERSION = 20.1.0 Modified: cassandra/trunk/drivers/txpy/txcql/cassandra/ttypes.py URL: http://svn.apache.org/viewvc/cassandra/trunk/drivers/txpy/txcql/cassandra/ttypes.py?rev=1091572r1=1091571r2=1091572view=diff == --- cassandra/trunk/drivers/txpy/txcql/cassandra/ttypes.py (original) +++ cassandra/trunk/drivers/txpy/txcql/cassandra/ttypes.py Tue Apr 12 21:08:08 2011 @@ -2324,6 +2324,7 @@ class CfDef: - merge_shards_chance - key_validation_class - row_cache_provider + - key_alias thrift_spec = ( @@ -2355,9 +2356,10 @@ class CfDef: (25, TType.DOUBLE, 'merge_shards_chance', None, None, ), # 25 (26, TType.STRING, 'key_validation_class', None, None, ), # 26 (27, TType.STRING, 'row_cache_provider', None, org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider, ), # 27 +(28, TType.STRING, 'key_alias', None, None, ), # 28 ) - def __init__(self, keyspace=None, name=None, column_type=thrift_spec[3][4], comparator_type=thrift_spec[5][4], subcomparator_type=None, comment=None, row_cache_size=thrift_spec[9][4], key_cache_size=thrift_spec[11][4], read_repair_chance=thrift_spec[12][4], column_metadata=None, gc_grace_seconds=None, default_validation_class=None, id=None, min_compaction_threshold=None, max_compaction_threshold=None, row_cache_save_period_in_seconds=None, key_cache_save_period_in_seconds=None, memtable_flush_after_mins=None, memtable_throughput_in_mb=None, memtable_operations_in_millions=None, replicate_on_write=None, merge_shards_chance=None, key_validation_class=None, row_cache_provider=thrift_spec[27][4],): + def __init__(self, keyspace=None, name=None, column_type=thrift_spec[3][4], comparator_type=thrift_spec[5][4], subcomparator_type=None, comment=None, row_cache_size=thrift_spec[9][4], key_cache_size=thrift_spec[11][4], read_repair_chance=thrift_spec[12][4], column_metadata=None, gc_grace_seconds=None, default_validation_class=None, id=None, min_compaction_threshold=None, max_compaction_threshold=None, row_cache_save_period_in_seconds=None, key_cache_save_period_in_seconds=None, memtable_flush_after_mins=None, memtable_throughput_in_mb=None, memtable_operations_in_millions=None, replicate_on_write=None, merge_shards_chance=None, key_validation_class=None, row_cache_provider=thrift_spec[27][4], key_alias=None,): self.keyspace = keyspace self.name = name self.column_type = column_type @@ -2382,6 +2384,7 @@ class CfDef: self.merge_shards_chance = merge_shards_chance self.key_validation_class = key_validation_class self.row_cache_provider = row_cache_provider +self.key_alias = key_alias def read(self, iprot): if iprot.__class__ == TBinaryProtocol.TBinaryProtocolAccelerated and isinstance(iprot.trans, TTransport.CReadableTransport) and self.thrift_spec is not None and fastbinary is not None: @@ -2518,6 +2521,11 @@ class CfDef: self.row_cache_provider = iprot.readString(); else: iprot.skip(ftype) + elif fid == 28: +if ftype == TType.STRING: + self.key_alias = iprot.readString(); +else: + iprot.skip(ftype) else: iprot.skip(ftype) iprot.readFieldEnd() @@ -2627,6 +2635,10 @@ class CfDef: oprot.writeFieldBegin('row_cache_provider', TType.STRING, 27) oprot.writeString(self.row_cache_provider) oprot.writeFieldEnd() +if self.key_alias != None: + oprot.writeFieldBegin('key_alias', TType.STRING, 28) + oprot.writeString(self.key_alias) + oprot.writeFieldEnd() oprot.writeFieldStop() oprot.writeStructEnd() def validate(self): Modified: cassandra/trunk/interface/cassandra.thrift URL: http://svn.apache.org/viewvc/cassandra/trunk/interface/cassandra.thrift?rev=1091572r1=1091571r2=1091572view=diff == --- cassandra/trunk/interface/cassandra.thrift (original) +++ cassandra/trunk/interface/cassandra.thrift Tue Apr 12 21:08:08 2011 @@ -46,7 +46,7 @@ namespace rb CassandraThrift # for every edit that doesn't result in a change to major/minor. # # See the Semantic Versioning Specification (SemVer) http://semver.org. -const string VERSION = 20.0.0 +const string VERSION = 20.1.0 # @@ -394,6 +394,7 @@ struct CfDef { 25: optional double merge_shards_chance, 26:
svn commit: r1091572 [1/3] - in /cassandra/trunk: ./ contrib/ drivers/py/cql/cassandra/ drivers/txpy/txcql/cassandra/ interface/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/avro/ src/ja
Author: jbellis Date: Tue Apr 12 21:08:08 2011 New Revision: 1091572 URL: http://svn.apache.org/viewvc?rev=1091572view=rev Log: merge from 0.8 Modified: cassandra/trunk/ (props changed) cassandra/trunk/CHANGES.txt cassandra/trunk/contrib/ (props changed) cassandra/trunk/drivers/py/cql/cassandra/constants.py cassandra/trunk/drivers/py/cql/cassandra/ttypes.py cassandra/trunk/drivers/txpy/txcql/cassandra/Cassandra.py cassandra/trunk/drivers/txpy/txcql/cassandra/constants.py cassandra/trunk/drivers/txpy/txcql/cassandra/ttypes.py cassandra/trunk/interface/cassandra.thrift cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/CfDef.java cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Constants.java cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/trunk/src/avro/internode.genavro cassandra/trunk/src/java/org/apache/cassandra/config/CFMetaData.java cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java cassandra/trunk/src/java/org/apache/cassandra/thrift/CassandraServer.java Propchange: cassandra/trunk/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 21:08:08 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6:922689-1052356,1052358-1053452,1053454,1053456-1081914,1083000 -/cassandra/branches/cassandra-0.7:1026516-1090647,1091503,1091542 +/cassandra/branches/cassandra-0.7:1026516-1091087,1091503,1091542 /cassandra/branches/cassandra-0.7.0:1053690-1055654 -/cassandra/branches/cassandra-0.8:1091148,1091508,1091544 +/cassandra/branches/cassandra-0.8:1090935-109,1091113,1091148,1091508,1091544 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3:774578-796573 /incubator/cassandra/branches/cassandra-0.4:810145-834239,834349-834350 Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1091572r1=1091571r2=1091572view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Tue Apr 12 21:08:08 2011 @@ -6,11 +6,10 @@ 2124, 2302, 2277) * avoid double RowMutation serialization on write path (CASSANDRA-1800) * make NetworkTopologyStrategy the default (CASSANDRA-1960) - * configurable internode encryption (CASSANDRA-1567) + * configurable internode encryption (CASSANDRA-1567, 2152) * human readable column names in sstable2json output (CASSANDRA-1933) * change default JMX port to 7199 (CASSANDRA-2027) * backwards compatible internal messaging (CASSANDRA-1015) - * check for null encryption in MessagingService (CASSANDRA-2152) * atomic switch of memtables and sstables (CASSANDRA-2284) * add pluggable SeedProvider (CASSANDRA-1669) * Fix clustertool to not throw exception when calling get_endpoints (CASSANDRA-2437) @@ -21,6 +20,8 @@ * give snapshots the same name on each node (CASSANDRA-1791) * multithreaded compaction (CASSANDRA-2191) * compaction throttling (CASSANDRA-2156) + * add key type information and alias (CASSANDRA-2311, 2396) + 0.7.5 * Avoid seeking when sstable2json exports the entire file (CASSANDRA-2318) @@ -46,7 +47,9 @@ index (CASSANDRA-2376) * fix race condition that could leave orphaned data files when dropping CF or KS (CASSANDRA-2381) + * convert mmap assertion to if/throw so scrub can catch it (CASSANDRA-2417) * Try harder to close files after compaction (CASSANDRA-2431) + * re-set bootstrapped flag after move finishes (CASSANDRA-2435) 0.7.4 Propchange: cassandra/trunk/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Apr 12 21:08:08 2011 @@ -1,7 +1,7 @@ /cassandra/branches/cassandra-0.6/contrib:922689-1052356,1052358-1053452,1053454,1053456-1068009 -/cassandra/branches/cassandra-0.7/contrib:1026516-1090647,1091503,1091542 +/cassandra/branches/cassandra-0.7/contrib:1026516-1091087,1091503,1091542 /cassandra/branches/cassandra-0.7.0/contrib:1053690-1055654 -/cassandra/branches/cassandra-0.8/contrib:1091148,1091508,1091544 +/cassandra/branches/cassandra-0.8/contrib:1090935-109,1091113,1091148,1091508,1091544 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689 /incubator/cassandra/branches/cassandra-0.3/contrib:774578-796573
svn commit: r1091576 - in /cassandra/branches/cassandra-0.8: test/distributed/org/apache/cassandra/ tools/stress/src/org/apache/cassandra/stress/
Author: jbellis Date: Tue Apr 12 21:21:30 2011 New Revision: 1091576 URL: http://svn.apache.org/viewvc?rev=1091576view=rev Log: fixes for replicationFactor change patch by Stu Hood; reviewed by jbellis for CASSANDRA-2462 Modified: cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/CountersTest.java cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MovementTest.java cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MutationTest.java cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/TestBase.java cassandra/branches/cassandra-0.8/tools/stress/src/org/apache/cassandra/stress/Session.java Modified: cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/CountersTest.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/CountersTest.java?rev=1091576r1=1091575r2=1091576view=diff == --- cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/CountersTest.java (original) +++ cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/CountersTest.java Tue Apr 12 21:21:30 2011 @@ -72,7 +72,7 @@ public class CountersTest extends TestBa { ByteBuffer bname = ByteBuffer.wrap(name.getBytes()); ColumnPath cpath = new ColumnPath(cf).setColumn(bname); -CounterColumn col = client.get_counter(key, cpath, cl).column; +CounterColumn col = client.get(key, cpath, cl).counter_column; assertEquals(bname, col.name); assertEquals(value.longValue(), col.value); } Modified: cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MovementTest.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MovementTest.java?rev=1091576r1=1091575r2=1091576view=diff == --- cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MovementTest.java (original) +++ cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MovementTest.java Tue Apr 12 21:21:30 2011 @@ -29,6 +29,7 @@ import java.util.*; import org.apache.cassandra.thrift.*; import org.apache.cassandra.tools.NodeProbe; +import org.apache.cassandra.utils.ByteBufferUtil; import org.apache.cassandra.utils.WrappedRunnable; import org.apache.cassandra.CassandraServiceController.Failure; Modified: cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MutationTest.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MutationTest.java?rev=1091576r1=1091575r2=1091576view=diff == --- cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MutationTest.java (original) +++ cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/MutationTest.java Tue Apr 12 21:21:30 2011 @@ -18,6 +18,7 @@ package org.apache.cassandra; +import java.io.IOException; import java.net.InetAddress; import java.nio.ByteBuffer; import java.util.*; @@ -25,7 +26,11 @@ import java.util.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; +import org.apache.cassandra.client.RingCache; +import org.apache.cassandra.dht.RandomPartitioner; +import org.apache.cassandra.service.StorageService; import org.apache.cassandra.thrift.*; +import org.apache.cassandra.utils.ByteBufferUtil; import org.apache.cassandra.utils.WrappedRunnable; import org.apache.thrift.TException; Modified: cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/TestBase.java URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/TestBase.java?rev=1091576r1=1091575r2=1091576view=diff == --- cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/TestBase.java (original) +++ cassandra/branches/cassandra-0.8/test/distributed/org/apache/cassandra/TestBase.java Tue Apr 12 21:21:30 2011 @@ -82,13 +82,14 @@ public abstract class TestBase { ListInetAddress hosts = controller.getHosts(); Cassandra.Client client = controller.createClient(hosts.get(0)); - +MapString,String stratOptions = new HashMapString,String(); +stratOptions.put(replication_factor, + rf); client.system_add_keyspace( new KsDef( name, org.apache.cassandra.locator.SimpleStrategy, -rf, -Arrays.asList(cfdef))); +Arrays.asList(cfdef)) +.setStrategy_options(stratOptions));
[jira] [Updated] (CASSANDRA-2462) Fix build for distributed and stress tests
[ https://issues.apache.org/jira/browse/CASSANDRA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2462: -- Component/s: Tools Tests Priority: Minor (was: Major) Affects Version/s: 0.8 Fix Version/s: 0.8 committed, minus the default value for RF in Stress.Session (which we want to omit for NTS) Fix build for distributed and stress tests -- Key: CASSANDRA-2462 URL: https://issues.apache.org/jira/browse/CASSANDRA-2462 Project: Cassandra Issue Type: Bug Components: Tests, Tools Affects Versions: 0.8 Reporter: Stu Hood Assignee: Stu Hood Priority: Minor Fix For: 0.8 Attachments: 0001-Update-stress-and-tests-for-trunk.txt Distributed and stress tests are not compiling for trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2463) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers
Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers Key: CASSANDRA-2463 URL: https://issues.apache.org/jira/browse/CASSANDRA-2463 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: Any Reporter: C. Scott Andreas Fix For: 0.7.4 Currently, Cassandra 0.7.x allocates a 256MB contiguous byte array at the beginning of a memtable flush or compaction (presently hard-coded as Config.in_memory_compaction_limit_in_mb). When several memtable flushes are triggered at once (as by `nodetool flush` or `nodetool snapshot`), the tenured generation will typically experience extreme pressure as it attempts to locate [n] contiguous 256mb chunks of heap to allocate. This will often trigger a promotion failure, resulting in a stop-the-world GC until the allocation can be made. (Note that in the case of the release valve being triggered, the problem is even further exacerbated; the release valve will ironically trigger two contiguous 256MB allocations when attempting to flush the two largest memtables). This patch sets the buffer to be used by BufferedRandomAccessFile to Math.min(bytesToWrite, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE) rather than a hard-coded 256MB. The typical resulting buffer size is 64kb. I've taken some time to measure the impact of this change on the base 0.7.4 release and with this patch applied. This test involved launching Cassandra, performing four million writes across three column families from three clients, and monitoring heap usage and garbage collections. Cassandra was launched with 2GB of heap and the default JVM options shipped with the project. This configuration has 7 column families with a total of 15GB of data. Here's the base 0.7.4 release: http://cl.ly/413g2K06121z252e2t10 Note that on launch, we see a flush + compaction triggered almost immediately, resulting in at least 7x very quick 256MB allocations maxing out the heap, resulting in a promotion failure and a full GC. As flushes proceeed, we see that most of these have a corresponding CMS, consistent with the pattern of a large allocation and immediate collection. We see a second promotion failure and full GC at the 75% mark as the allocations cannot be satisfied without a collection, along with several CMSs in between. In the failure cases, the allocation requests occur so quickly that a standard CMS phase cannot completed before a ParNew attempts to promote the surviving byte array into the tenured generation. The heap usage and GC profile of this graph is very unhealthy. Here's the 0.7.4 release with this patch applied: http://cl.ly/050I1g26401B1X0w3s1f This graph is very different. At launch, rather than a immediate spike to full allocation and a promotion failure, we see a slow allocation slope reaching only 1/8th of total heap size. As writes begin, we see several flushes and compactions, but none result in immediate, large allocations. The ParNew collector keeps up with collections far more ably, resulting in only one healthy CMS collection with no promotion failure. Unlike the unhealthy rapid allocation and massive collection pattern we see in the first graph, this graph depicts a healthy sawtooth pattern of ParNews and an occasional effective CMS with no danger of heap fragmentation resulting in a promotion failure. The bottom line is that there's no need to allocate a hard-coded 256MB write buffer for flushing memtables and compactions to disk. Doing so results in unhealthy rapid allocation patterns and increases the probability of triggering promotion failures and full stop-the-world GCs which can cause nodes to become unresponsive and shunned from the ring during flushes and compactions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2463) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers
[ https://issues.apache.org/jira/browse/CASSANDRA-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] C. Scott Andreas updated CASSANDRA-2463: Comment: was deleted (was: diff --git a/src/java/org/apache/cassandra/db/BinaryMemtable.java b/src/java/org/apache/cassandra/db/BinaryMemtable.java index 4b4e2ff..14665ad 100644 --- a/src/java/org/apache/cassandra/db/BinaryMemtable.java +++ b/src/java/org/apache/cassandra/db/BinaryMemtable.java @@ -125,7 +125,7 @@ public class BinaryMemtable implements IFlushable private SSTableReader writeSortedContents(ListDecoratedKey sortedKeys) throws IOException { logger.info(Writing + this); -SSTableWriter writer = cfs.createFlushWriter(sortedKeys.size()); +SSTableWriter writer = cfs.createFlushWriter(sortedKeys.size(), currentSize.get()); for (DecoratedKey key : sortedKeys) { diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 8ff9f82..14e984b 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -2183,9 +2183,9 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean } } -public SSTableWriter createFlushWriter(long estimatedRows) throws IOException +public SSTableWriter createFlushWriter(long estimatedRows, long estimatedSize) throws IOException { -return new SSTableWriter(getFlushPath(), estimatedRows, metadata, partitioner); +return new SSTableWriter(getFlushPath(), estimatedRows, estimatedSize, metadata, partitioner); } public SSTableWriter createCompactionWriter(long estimatedRows, String location) throws IOException diff --git a/src/java/org/apache/cassandra/db/Memtable.java b/src/java/org/apache/cassandra/db/Memtable.java index db65f01..3acb7a9 100644 --- a/src/java/org/apache/cassandra/db/Memtable.java +++ b/src/java/org/apache/cassandra/db/Memtable.java @@ -155,7 +155,7 @@ public class Memtable implements ComparableMemtable, IFlushable private SSTableReader writeSortedContents() throws IOException { logger.info(Writing + this); -SSTableWriter writer = cfs.createFlushWriter(columnFamilies.size()); +SSTableWriter writer = cfs.createFlushWriter(columnFamilies.size(), currentThroughput.get()); for (Map.EntryDecoratedKey, ColumnFamily entry : columnFamilies.entrySet()) writer.append(entry.getKey(), entry.getValue()); diff --git a/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java b/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java index 809a3f4..d05542f 100644 --- a/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java +++ b/src/java/org/apache/cassandra/io/sstable/SSTableWriter.java @@ -25,6 +25,7 @@ import java.util.Arrays; import java.util.Collections; import java.util.HashSet; import java.util.Set; +import java.lang.Math; import com.google.common.collect.Sets; @@ -65,7 +66,7 @@ public class SSTableWriter extends SSTable this(filename, keyCount, DatabaseDescriptor.getCFMetaData(Descriptor.fromFilename(filename)), StorageService.getPartitioner()); } -public SSTableWriter(String filename, long keyCount, CFMetaData metadata, IPartitioner partitioner) throws IOException +public SSTableWriter(String filename, long keyCount, long bufferSize, CFMetaData metadata, IPartitioner partitioner) throws IOException { super(Descriptor.fromFilename(filename), new HashSetComponent(Arrays.asList(Component.DATA, Component.FILTER, Component.PRIMARY_INDEX, Component.STATS)), @@ -75,7 +76,17 @@ public class SSTableWriter extends SSTable SSTable.defaultColumnHistogram()); iwriter = new IndexWriter(descriptor, partitioner, keyCount); dbuilder = SegmentedFile.getBuilder(DatabaseDescriptor.getDiskAccessMode()); -dataFile = new BufferedRandomAccessFile(new File(getFilename()), rw, DatabaseDescriptor.getInMemoryCompactionLimit(), true); + +if (bufferSize == 0) +bufferSize = BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE; +else +bufferSize = Math.min(bufferSize, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE); + +dataFile = new BufferedRandomAccessFile(new File(getFilename()), rw, (int) bufferSize, true); +} + +public SSTableWriter(String filename, long keyCount, CFMetaData metadata, IPartitioner partitioner) throws IOException { +this(filename, keyCount, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE, metadata, partitioner); } public void mark() ) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers Key: CASSANDRA-2463 URL:
[jira] [Updated] (CASSANDRA-2463) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers
[ https://issues.apache.org/jira/browse/CASSANDRA-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] C. Scott Andreas updated CASSANDRA-2463: Attachment: patch.diff Patch attached. Applies cleanly to tag 'cassandra-0.7.4'. All tests pass. Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers Key: CASSANDRA-2463 URL: https://issues.apache.org/jira/browse/CASSANDRA-2463 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: Any Reporter: C. Scott Andreas Labels: patch Fix For: 0.7.4 Attachments: patch.diff Original Estimate: 72h Remaining Estimate: 72h Currently, Cassandra 0.7.x allocates a 256MB contiguous byte array at the beginning of a memtable flush or compaction (presently hard-coded as Config.in_memory_compaction_limit_in_mb). When several memtable flushes are triggered at once (as by `nodetool flush` or `nodetool snapshot`), the tenured generation will typically experience extreme pressure as it attempts to locate [n] contiguous 256mb chunks of heap to allocate. This will often trigger a promotion failure, resulting in a stop-the-world GC until the allocation can be made. (Note that in the case of the release valve being triggered, the problem is even further exacerbated; the release valve will ironically trigger two contiguous 256MB allocations when attempting to flush the two largest memtables). This patch sets the buffer to be used by BufferedRandomAccessFile to Math.min(bytesToWrite, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE) rather than a hard-coded 256MB. The typical resulting buffer size is 64kb. I've taken some time to measure the impact of this change on the base 0.7.4 release and with this patch applied. This test involved launching Cassandra, performing four million writes across three column families from three clients, and monitoring heap usage and garbage collections. Cassandra was launched with 2GB of heap and the default JVM options shipped with the project. This configuration has 7 column families with a total of 15GB of data. Here's the base 0.7.4 release: http://cl.ly/413g2K06121z252e2t10 Note that on launch, we see a flush + compaction triggered almost immediately, resulting in at least 7x very quick 256MB allocations maxing out the heap, resulting in a promotion failure and a full GC. As flushes proceeed, we see that most of these have a corresponding CMS, consistent with the pattern of a large allocation and immediate collection. We see a second promotion failure and full GC at the 75% mark as the allocations cannot be satisfied without a collection, along with several CMSs in between. In the failure cases, the allocation requests occur so quickly that a standard CMS phase cannot completed before a ParNew attempts to promote the surviving byte array into the tenured generation. The heap usage and GC profile of this graph is very unhealthy. Here's the 0.7.4 release with this patch applied: http://cl.ly/050I1g26401B1X0w3s1f This graph is very different. At launch, rather than a immediate spike to full allocation and a promotion failure, we see a slow allocation slope reaching only 1/8th of total heap size. As writes begin, we see several flushes and compactions, but none result in immediate, large allocations. The ParNew collector keeps up with collections far more ably, resulting in only one healthy CMS collection with no promotion failure. Unlike the unhealthy rapid allocation and massive collection pattern we see in the first graph, this graph depicts a healthy sawtooth pattern of ParNews and an occasional effective CMS with no danger of heap fragmentation resulting in a promotion failure. The bottom line is that there's no need to allocate a hard-coded 256MB write buffer for flushing memtables and compactions to disk. Doing so results in unhealthy rapid allocation patterns and increases the probability of triggering promotion failures and full stop-the-world GCs which can cause nodes to become unresponsive and shunned from the ring during flushes and compactions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of ArchitectureInternals by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ArchitectureInternals page has been changed by PeterSchuller. The comment on this change is: Augment the read path description a bit to account better for read repair and snitches.. http://wiki.apache.org/cassandra/ArchitectureInternals?action=diffrev1=21rev2=22 -- * See [[ArchitectureSSTable]] and ArchitectureCommitLog for more details = Read path = - * !StorageProxy gets the nodes responsible for replicas of the keys from the !ReplicationStrategy, then sends read messages to them + * !StorageProxy gets the endpoints (nodes) responsible for replicas of the keys from the !ReplicationStrategy * This may be a !SliceFromReadCommand, a !SliceByNamesReadCommand, or a !RangeSliceReadCommand, depending + * StorageProxy filters the endpoints to contain only those that are currently up/alive + * StorageProxy then sorts, by asking the endpoint snitch, the responsible nodes by proximity. +* The definition of proximity is up to the endpoint snitch + * With a SimpleSnitch, proximity directly corresponds to proximity on the token ring. + * With the NetworkTopologySnitch, endpoints that are in the same rack are always considered closer than those that are not. Failing that, endpoints in the same data center are always considered closer than those that are not. + * The DynamicSnitch, typically enabled in the configuration, wraps whatever underlying snitch (such as SimpleSnitch and NetworkTopologySnitch) so as to dynamically adjust the perceived closeness of endpoints based on their recent performance. This is in an effort to try to avoid routing traffic to endpoints that are slow to respond. + * StorageProxy then arranges for messages to be sent to nodes as required: +* The closest node (as determined by proximity sorting as described above) will be sent a command to perform an actual data read (i.e., return data to the co-ordinating node). +* As required by consistency level, additional nodes may be sent digest commands, asking them to perform the read locally but send back the digest only. For example, at replication factor 3 a read at consistency level QUORUM would require one digest read in additional to the data read sent to the closest node. (See ReadCallback, instantiated by StorageProxy) +* If read repair is enabled (probabilistically if read repair chance is somewhere between 0% and 100%), remaining nodes responsible for the row will be sent messages to compute the digest of the response. (Again, see ReadCallback, instantiated by StorageProxy) * On the data node, !ReadVerbHandler gets the data from CFS.getColumnFamily or CFS.getRangeSlice and sends it back as a !ReadResponse * The row is located by doing a binary search on the index in SSTableReader.getPosition * For single-row requests, we use a !QueryFilter subclass to pick the data from the Memtable and SSTables that we are looking for. The Memtable read is straightforward. The SSTable read is a little different depending on which kind of request it is: @@ -30, +40 @@ * If we are reading a group of columns by name, we still use the column index to locate each column, but first we check the row-level bloom filter to see if we need to do anything at all * The column readers provide an Iterator interface, so the filter can easily stop when it's done, without reading more columns than necessary * Since we need to potentially merge columns from multiple SSTable versions, the reader iterators are combined through a !ReducingIterator, which takes an iterator of uncombined columns as input, and yields combined versions as output - * If a quorum read was requested, !StorageProxy waits for a majority of nodes to reply and makes sure the answers match before returning. Otherwise, it returns the data reply as soon as it gets it, and checks the other replies for discrepancies in the background in !StorageService.doConsistencyCheck. This is called read repair, and also helps achieve consistency sooner. -* As an optimization, !StorageProxy only asks the closest replica for the actual data; the other replicas are asked only to compute a hash of the data. + + In addition: + * At any point if a message is destined for the local node, the appropriate piece of work (data read or digest read) is directly submitted to the appropriate local stage (see StageManager) rather than going through messaging over the network. + * The fact that a data read is only submitted to the closest replica is intended as an optimization to avoid sending excessive amounts of data over the network. A digest read will take the full cost of a read internally on the node (CPU and in particular disk), but will avoid taxing the network. = Deletes = * See DistributedDeletes
[Cassandra Wiki] Update of ArchitectureInternals by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ArchitectureInternals page has been changed by PeterSchuller. The comment on this change is: Avoid StorageProxy becoming links. http://wiki.apache.org/cassandra/ArchitectureInternals?action=diffrev1=22rev2=23 -- = Read path = * !StorageProxy gets the endpoints (nodes) responsible for replicas of the keys from the !ReplicationStrategy * This may be a !SliceFromReadCommand, a !SliceByNamesReadCommand, or a !RangeSliceReadCommand, depending - * StorageProxy filters the endpoints to contain only those that are currently up/alive + * !StorageProxy filters the endpoints to contain only those that are currently up/alive - * StorageProxy then sorts, by asking the endpoint snitch, the responsible nodes by proximity. + * !StorageProxy then sorts, by asking the endpoint snitch, the responsible nodes by proximity. * The definition of proximity is up to the endpoint snitch * With a SimpleSnitch, proximity directly corresponds to proximity on the token ring. * With the NetworkTopologySnitch, endpoints that are in the same rack are always considered closer than those that are not. Failing that, endpoints in the same data center are always considered closer than those that are not. * The DynamicSnitch, typically enabled in the configuration, wraps whatever underlying snitch (such as SimpleSnitch and NetworkTopologySnitch) so as to dynamically adjust the perceived closeness of endpoints based on their recent performance. This is in an effort to try to avoid routing traffic to endpoints that are slow to respond. - * StorageProxy then arranges for messages to be sent to nodes as required: + * !StorageProxy then arranges for messages to be sent to nodes as required: * The closest node (as determined by proximity sorting as described above) will be sent a command to perform an actual data read (i.e., return data to the co-ordinating node). * As required by consistency level, additional nodes may be sent digest commands, asking them to perform the read locally but send back the digest only. For example, at replication factor 3 a read at consistency level QUORUM would require one digest read in additional to the data read sent to the closest node. (See ReadCallback, instantiated by StorageProxy) * If read repair is enabled (probabilistically if read repair chance is somewhere between 0% and 100%), remaining nodes responsible for the row will be sent messages to compute the digest of the response. (Again, see ReadCallback, instantiated by StorageProxy)
[Cassandra Wiki] Update of ArchitectureInternals by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ArchitectureInternals page has been changed by PeterSchuller. The comment on this change is: Clarify that the replication strategy produces endpoints as a function of row keys. http://wiki.apache.org/cassandra/ArchitectureInternals?action=diffrev1=23rev2=24 -- * See [[ArchitectureSSTable]] and ArchitectureCommitLog for more details = Read path = - * !StorageProxy gets the endpoints (nodes) responsible for replicas of the keys from the !ReplicationStrategy + * !StorageProxy gets the endpoints (nodes) responsible for replicas of the keys from the !ReplicationStrategy as a function of the row key (the key of the row being read) * This may be a !SliceFromReadCommand, a !SliceByNamesReadCommand, or a !RangeSliceReadCommand, depending * !StorageProxy filters the endpoints to contain only those that are currently up/alive * !StorageProxy then sorts, by asking the endpoint snitch, the responsible nodes by proximity.
[Cassandra Wiki] Update of ArchitectureInternals by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ArchitectureInternals page has been changed by PeterSchuller. The comment on this change is: AbstractNetworkTopologySnitch does the proximity sorting, not NetworkToplogyStrategy. http://wiki.apache.org/cassandra/ArchitectureInternals?action=diffrev1=24rev2=25 -- * !StorageProxy then sorts, by asking the endpoint snitch, the responsible nodes by proximity. * The definition of proximity is up to the endpoint snitch * With a SimpleSnitch, proximity directly corresponds to proximity on the token ring. - * With the NetworkTopologySnitch, endpoints that are in the same rack are always considered closer than those that are not. Failing that, endpoints in the same data center are always considered closer than those that are not. + * With implementations based on AbstractNetworkTopologySnitch (such as PropertyFileSnitch), endpoints that are in the same rack are always considered closer than those that are not. Failing that, endpoints in the same data center are always considered closer than those that are not. * The DynamicSnitch, typically enabled in the configuration, wraps whatever underlying snitch (such as SimpleSnitch and NetworkTopologySnitch) so as to dynamically adjust the perceived closeness of endpoints based on their recent performance. This is in an effort to try to avoid routing traffic to endpoints that are slow to respond. * !StorageProxy then arranges for messages to be sent to nodes as required: * The closest node (as determined by proximity sorting as described above) will be sent a command to perform an actual data read (i.e., return data to the co-ordinating node).
[Cassandra Wiki] Update of ArchitectureInternals by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ArchitectureInternals page has been changed by PeterSchuller. http://wiki.apache.org/cassandra/ArchitectureInternals?action=diffrev1=25rev2=26 -- * The DynamicSnitch, typically enabled in the configuration, wraps whatever underlying snitch (such as SimpleSnitch and NetworkTopologySnitch) so as to dynamically adjust the perceived closeness of endpoints based on their recent performance. This is in an effort to try to avoid routing traffic to endpoints that are slow to respond. * !StorageProxy then arranges for messages to be sent to nodes as required: * The closest node (as determined by proximity sorting as described above) will be sent a command to perform an actual data read (i.e., return data to the co-ordinating node). -* As required by consistency level, additional nodes may be sent digest commands, asking them to perform the read locally but send back the digest only. For example, at replication factor 3 a read at consistency level QUORUM would require one digest read in additional to the data read sent to the closest node. (See ReadCallback, instantiated by StorageProxy) +* As required by consistency level, additional nodes may be sent digest commands, asking them to perform the read locally but send back the digest only. + * For example, at replication factor 3 a read at consistency level QUORUM would require one digest read in additional to the data read sent to the closest node. (See ReadCallback, instantiated by StorageProxy) * If read repair is enabled (probabilistically if read repair chance is somewhere between 0% and 100%), remaining nodes responsible for the row will be sent messages to compute the digest of the response. (Again, see ReadCallback, instantiated by StorageProxy) * On the data node, !ReadVerbHandler gets the data from CFS.getColumnFamily or CFS.getRangeSlice and sends it back as a !ReadResponse * The row is located by doing a binary search on the index in SSTableReader.getPosition
[jira] [Updated] (CASSANDRA-2463) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers
[ https://issues.apache.org/jira/browse/CASSANDRA-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2463: -- Attachment: 2463-v2.txt I started making it more complicated: {code} // the gymnastics here are because // - we want the buffer large enough that we're not re-buffering when we have to seek back to the //start of a row to write the data size. Here, 10% larger than the average row is large enough, //meaning we expect to seek and rebuffer about 1/10 of the time. // - but we don't want to allocate a huge buffer unnecessarily for a small amount of data // - and on the low end, we don't want to be absurdly stingy with the buffer size for small rows assert estimatedSize 0; long maxBufferSize = Math.min(DatabaseDescriptor.getInMemoryCompactionLimit(), 1024 * 1024); int bufferSize; if (estimatedSize 64 * 1024) { bufferSize = (int) estimatedSize; } else { long estimatedRowSize = estimatedSize / keyCount; bufferSize = (int) Math.min(Math.max(1.1 * estimatedRowSize, 64 * 1024), maxBufferSize); } {code} ... but the larger our buffer is, the larger the penalty for guessing wrong when we have to seek back and rebuffer. Then I went through and added size estimation to the CompactionManager, until I thought it's kind of ridiculous to be worrying about saving a few bytes less than 64KB, especially when we expect most memtables to have more data in them than 64K when flushed. Thus, I arrived at the patch Antoine de Saint-Exupery would have written, attached as v2. Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers Key: CASSANDRA-2463 URL: https://issues.apache.org/jira/browse/CASSANDRA-2463 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: Any Reporter: C. Scott Andreas Labels: patch Fix For: 0.7.4 Attachments: 2463-v2.txt, patch.diff Original Estimate: 72h Remaining Estimate: 72h Currently, Cassandra 0.7.x allocates a 256MB contiguous byte array at the beginning of a memtable flush or compaction (presently hard-coded as Config.in_memory_compaction_limit_in_mb). When several memtable flushes are triggered at once (as by `nodetool flush` or `nodetool snapshot`), the tenured generation will typically experience extreme pressure as it attempts to locate [n] contiguous 256mb chunks of heap to allocate. This will often trigger a promotion failure, resulting in a stop-the-world GC until the allocation can be made. (Note that in the case of the release valve being triggered, the problem is even further exacerbated; the release valve will ironically trigger two contiguous 256MB allocations when attempting to flush the two largest memtables). This patch sets the buffer to be used by BufferedRandomAccessFile to Math.min(bytesToWrite, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE) rather than a hard-coded 256MB. The typical resulting buffer size is 64kb. I've taken some time to measure the impact of this change on the base 0.7.4 release and with this patch applied. This test involved launching Cassandra, performing four million writes across three column families from three clients, and monitoring heap usage and garbage collections. Cassandra was launched with 2GB of heap and the default JVM options shipped with the project. This configuration has 7 column families with a total of 15GB of data. Here's the base 0.7.4 release: http://cl.ly/413g2K06121z252e2t10 Note that on launch, we see a flush + compaction triggered almost immediately, resulting in at least 7x very quick 256MB allocations maxing out the heap, resulting in a promotion failure and a full GC. As flushes proceeed, we see that most of these have a corresponding CMS, consistent with the pattern of a large allocation and immediate collection. We see a second promotion failure and full GC at the 75% mark as the allocations cannot be satisfied without a collection, along with several CMSs in between. In the failure cases, the allocation requests occur so quickly that a standard CMS phase cannot completed before a ParNew attempts to promote the surviving byte array into the tenured generation. The heap usage and GC profile of this graph is very unhealthy. Here's the 0.7.4 release with this patch applied: http://cl.ly/050I1g26401B1X0w3s1f This graph is very different. At launch, rather than a immediate spike to full allocation and a promotion failure, we see a slow allocation slope reaching only 1/8th of total heap size. As writes begin, we see several
[jira] [Commented] (CASSANDRA-2283) Streaming Old Format Data Fails in 0.7.3 after upgrade from 0.6.8
[ https://issues.apache.org/jira/browse/CASSANDRA-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019085#comment-13019085 ] Stu Hood commented on CASSANDRA-2283: - Looks good: only comment is that BootstrapTest should probably purposely use an old version and check that it is preserved. Streaming Old Format Data Fails in 0.7.3 after upgrade from 0.6.8 - Key: CASSANDRA-2283 URL: https://issues.apache.org/jira/browse/CASSANDRA-2283 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.1 Environment: 0.7.3 upgraded from 0.6.8, Linux Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) Reporter: Erik Onnen Assignee: Jonathan Ellis Fix For: 0.7.5 Attachments: 2283.txt After successfully upgrading a 0.6.8 ring to 0.7.3, we needed to bootstrap in a new node relatively quickly. When starting the new node with an assigned token in auto bootstrap mode, we see the following exceptions on the new node: INFO [main] 2011-03-07 10:37:32,671 StorageService.java (line 505) Joining: sleeping 3 ms for pending range setup INFO [main] 2011-03-07 10:38:02,679 StorageService.java (line 505) Bootstrapping INFO [HintedHandoff:1] 2011-03-07 10:38:02,899 HintedHandOffManager.java (line 304) Started hinted handoff for endpoint /10.211.14.200 INFO [HintedHandoff:1] 2011-03-07 10:38:02,900 HintedHandOffManager.java (line 360) Finished hinted handoff of 0 rows to endpoint /10.211.14.200 INFO [CompactionExecutor:1] 2011-03-07 10:38:04,924 SSTableReader.java (line 154) Opening /mnt/services/cassandra/var/data/0.7.3/data/Stuff/stuff-f-1 INFO [CompactionExecutor:1] 2011-03-07 10:38:05,390 SSTableReader.java (line 154) Opening /mnt/services/cassandra/var/data/0.7.3/data/Stuff/stuff-f-2 INFO [CompactionExecutor:1] 2011-03-07 10:38:05,768 SSTableReader.java (line 154) Opening /mnt/services/cassandra/var/data/0.7.3/data/Stuff/stuffid-f-1 INFO [CompactionExecutor:1] 2011-03-07 10:38:06,389 SSTableReader.java (line 154) Opening /mnt/services/cassandra/var/data/0.7.3/data/Stuff/stuffid-f-2 INFO [CompactionExecutor:1] 2011-03-07 10:38:06,581 SSTableReader.java (line 154) Opening /mnt/services/cassandra/var/data/0.7.3/data/Stuff/stuffid-f-3 ERROR [CompactionExecutor:1] 2011-03-07 10:38:07,056 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.EOFException at org.apache.cassandra.io.sstable.IndexHelper.skipIndex(IndexHelper.java:65) at org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:303) at org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:923) at org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:916) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) INFO [CompactionExecutor:1] 2011-03-07 10:38:08,480 SSTableReader.java (line 154) Opening /mnt/services/cassandra/var/data/0.7.3/data/Stuff/stuffid-f-5 INFO [CompactionExecutor:1] 2011-03-07 10:38:08,582 SSTableReader.java (line 154) Opening /mnt/services/cassandra/var/data/0.7.3/data/Stuff/stuffid_reg_idx-f-1 ERROR [CompactionExecutor:1] 2011-03-07 10:38:08,635 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.EOFException at org.apache.cassandra.io.sstable.IndexHelper.skipIndex(IndexHelper.java:65) at org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:303) at org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:923) at org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:916) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR [CompactionExecutor:1] 2011-03-07 10:38:08,666 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.EOFException at org.apache.cassandra.io.sstable.IndexHelper.skipIndex(IndexHelper.java:65) at
[jira] [Commented] (CASSANDRA-2338) C* consistency level needs to be pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019088#comment-13019088 ] Peter Schuller commented on CASSANDRA-2338: --- A related concern that is not directly about consistency but about performance is that one may wish to control to what extent requests are sent to extra nodes than required by consistency. For example, a very nice property of running with QUORUM + read-repair turned fully on (and let's say RF=3), is that for any request, it's totally fine for a single node to be e.g. slow without giving an application visible poor latency. If the dynamic snitch is engaged, the slow node should usually not be the one that is considered closest, so it's not the one getting the data read. Turning off read-repair negates that since without read repair, messages are sent only to those required by consistency level. So if any node is slow, the request will be slow. Also related is that for reads that are expected to be small, it may be that it is irrelevant to do the digest-only optimization. For many cases, the disk I/O and perhaps CPU cost is going to be a lot more relevant than the overhead of sending some extra ~ 40 bytes or whatever over the network. In such cases, it is probably often preferable to send read commands to all, or at least multiple, nodes such one is not depending on a specific node being up + fast in order to return the data. For example, suppose I have a low-consistency situation where I care about good latency in terms of avoiding outliers. While CL.ONE is the typical suggestion, a better avoidance of outliers should be possible if CL.ONE is used but each node is sent a full read command such that the request can complete immediately whenever any node responds, without waiting for a timeout (or just a slow response not timing out) from the node that happens to be considered closest. This may be out of scope for this ticket, but maybe worth at least thinking about. If Cassandra can offer, at reasonable complexity for the application writer, detailed choices for all of these at the same time: (1) Least number of endpoints for consistency, at a per-DC level (to control consistency). (2) Maximum allowed, at a per-DC level (to control latency). (3) Pessimistic over-messaging (to control latency, in particular outliers). ... it should be enough to cover a great many cases (and from a PR perspective, under the assumption that the complexity cost is not too high, it would really show-case what kind of detailed control and specific tuning is fundamentally possible given the data and messaging model). C* consistency level needs to be pluggable -- Key: CASSANDRA-2338 URL: https://issues.apache.org/jira/browse/CASSANDRA-2338 Project: Cassandra Issue Type: New Feature Reporter: Matthew F. Dennis Priority: Minor for cases where people want to run C* across multiple DCs for disaster recovery et cetera where normal operations only happen in the first DC (e.g. no writes/reads happen in the remove DC under normal operation) neither LOCAL_QUORUM or EACH_QUORUM really suffices. Consider the case with RF of DC1:3 DC2:2 LOCAL_QUORUM doesn't provide any guarantee that data is in the remote DC. EACH_QUORUM requires that both nodes in the remote DC are up. It would be useful in some situations to be able to specify a strategy where LOCAL_QUORUM is used for the local DC and at least one in a remote DC (and/or at least in *each* remote DC). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of ReadRepair by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ReadRepair page has been changed by PeterSchuller. http://wiki.apache.org/cassandra/ReadRepair?action=diffrev1=2rev2=3 -- - Read repair means that when a query is made against a given key, we perform that query against all the replicas of the key and push the most recent version to any out-of-date replicas. If a low !ConsistencyLevel was specified, this is done in the background after returning the data from the closest replica to the client; otherwise, it is done before returning the data. + Read repair means that when a query is made against a given key, we perform a [[DigestQueries|digest query]] against all the replicas of the key and push the most recent version to any out-of-date replicas. If a low !ConsistencyLevel was specified, this is done in the background after returning the data from the closest replica to the client; otherwise, it is done before returning the data. - - (To reduce the impact on the network, only the closest replica to the coordinator node actually sends the full result set; the others send hashes.) This means that in almost all cases, at most the first instance of a query will return old data.
[Cassandra Wiki] Update of DigestQueries by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The DigestQueries page has been changed by PeterSchuller. http://wiki.apache.org/cassandra/DigestQueries -- New page: A digest query is like a ready query except that instead of the receiving node actually returning the data, it only returns a digest (hash) of the would-be data. The intent of submitting a digest query is to discover whether two or more nodes agree on what the current data is, without sending the data over the network. In particular for large amounts of data, this is a significant saving of bandwidth cost relative to sending the full data response. Keep in mind that the cost of potentially going down to disk, and most or all of the CPU cost, associated with a query will still be taken on nodes that receive digest queries. The optimization is only for bandwidth.
Page HowToPublishToMavenCentral renamed to HowToPublishReleases on Cassandra Wiki
Dear wiki user, You have subscribed to a wiki page Cassandra Wiki for change notification. The page HowToPublishReleases has been renamed from HowToPublishToMavenCentral by EricEvans. http://wiki.apache.org/cassandra/HowToPublishReleases
[jira] [Commented] (CASSANDRA-2463) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers
[ https://issues.apache.org/jira/browse/CASSANDRA-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019097#comment-13019097 ] Peter Schuller commented on CASSANDRA-2463: --- A noteworthy factor here is that unless an fsync()+fadvise()/madvise() have evicted data, in the normal case this stuff should still be in page cache for any reasonably sized row. For truly huge rows, the penalty of seeking back should be insignificant anyway. Total +1 on avoiding huge allocations. I was surprised to realize, when this ticket came along, that this was happening ;) I have been suspecting that the bloom filters are a major concern too with respect to triggering promotion failures (but I haven't done testing to confirm this). Are there other cases than this and the bloom filters where we know that we're doing large allocations? Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers Key: CASSANDRA-2463 URL: https://issues.apache.org/jira/browse/CASSANDRA-2463 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: Any Reporter: C. Scott Andreas Labels: patch Fix For: 0.7.4 Attachments: 2463-v2.txt, patch.diff Original Estimate: 72h Remaining Estimate: 72h Currently, Cassandra 0.7.x allocates a 256MB contiguous byte array at the beginning of a memtable flush or compaction (presently hard-coded as Config.in_memory_compaction_limit_in_mb). When several memtable flushes are triggered at once (as by `nodetool flush` or `nodetool snapshot`), the tenured generation will typically experience extreme pressure as it attempts to locate [n] contiguous 256mb chunks of heap to allocate. This will often trigger a promotion failure, resulting in a stop-the-world GC until the allocation can be made. (Note that in the case of the release valve being triggered, the problem is even further exacerbated; the release valve will ironically trigger two contiguous 256MB allocations when attempting to flush the two largest memtables). This patch sets the buffer to be used by BufferedRandomAccessFile to Math.min(bytesToWrite, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE) rather than a hard-coded 256MB. The typical resulting buffer size is 64kb. I've taken some time to measure the impact of this change on the base 0.7.4 release and with this patch applied. This test involved launching Cassandra, performing four million writes across three column families from three clients, and monitoring heap usage and garbage collections. Cassandra was launched with 2GB of heap and the default JVM options shipped with the project. This configuration has 7 column families with a total of 15GB of data. Here's the base 0.7.4 release: http://cl.ly/413g2K06121z252e2t10 Note that on launch, we see a flush + compaction triggered almost immediately, resulting in at least 7x very quick 256MB allocations maxing out the heap, resulting in a promotion failure and a full GC. As flushes proceeed, we see that most of these have a corresponding CMS, consistent with the pattern of a large allocation and immediate collection. We see a second promotion failure and full GC at the 75% mark as the allocations cannot be satisfied without a collection, along with several CMSs in between. In the failure cases, the allocation requests occur so quickly that a standard CMS phase cannot completed before a ParNew attempts to promote the surviving byte array into the tenured generation. The heap usage and GC profile of this graph is very unhealthy. Here's the 0.7.4 release with this patch applied: http://cl.ly/050I1g26401B1X0w3s1f This graph is very different. At launch, rather than a immediate spike to full allocation and a promotion failure, we see a slow allocation slope reaching only 1/8th of total heap size. As writes begin, we see several flushes and compactions, but none result in immediate, large allocations. The ParNew collector keeps up with collections far more ably, resulting in only one healthy CMS collection with no promotion failure. Unlike the unhealthy rapid allocation and massive collection pattern we see in the first graph, this graph depicts a healthy sawtooth pattern of ParNews and an occasional effective CMS with no danger of heap fragmentation resulting in a promotion failure. The bottom line is that there's no need to allocate a hard-coded 256MB write buffer for flushing memtables and compactions to disk. Doing so results in unhealthy rapid allocation patterns and increases the probability of triggering promotion failures and full stop-the-world GCs which can cause nodes to become unresponsive and shunned from the ring during
[jira] [Commented] (CASSANDRA-2430) Show list of possibly corrupt sstables when receiving errors
[ https://issues.apache.org/jira/browse/CASSANDRA-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019108#comment-13019108 ] Joaquin Casares commented on CASSANDRA-2430: And another sample error where this would help: ERROR 23:21:03,693 Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.EOFException: attempted to skip 1647598019 bytes but only skipped 27856771 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:117) Show list of possibly corrupt sstables when receiving errors Key: CASSANDRA-2430 URL: https://issues.apache.org/jira/browse/CASSANDRA-2430 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.7.0 Reporter: Joaquin Casares Priority: Minor On errors such as these: ERROR [CompactionExecutor:1] 2011-04-06 11:57:00,125 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 It would be nice to know which sstable, and perhaps sstables, are causing these errors. Any additional information would also be beneficial in helping narrow the scope of corruption. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2462) Fix build for distributed and stress tests
[ https://issues.apache.org/jira/browse/CASSANDRA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019109#comment-13019109 ] Stu Hood commented on CASSANDRA-2462: - Not having a default will break standard stress runs unless people specify the RF flag, right? Fix build for distributed and stress tests -- Key: CASSANDRA-2462 URL: https://issues.apache.org/jira/browse/CASSANDRA-2462 Project: Cassandra Issue Type: Bug Components: Tests, Tools Affects Versions: 0.8 Reporter: Stu Hood Assignee: Stu Hood Priority: Minor Fix For: 0.8 Attachments: 0001-Update-stress-and-tests-for-trunk.txt Distributed and stress tests are not compiling for trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2464) Distributed test for read repair
Distributed test for read repair Key: CASSANDRA-2464 URL: https://issues.apache.org/jira/browse/CASSANDRA-2464 Project: Cassandra Issue Type: Test Reporter: Stu Hood Assignee: Stu Hood Fix For: 0.8 See title. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2463) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers
[ https://issues.apache.org/jira/browse/CASSANDRA-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019122#comment-13019122 ] Jonathan Ellis commented on CASSANDRA-2463: --- (I wonder if this is the cause of the intermittent load-spikes-after-upgrade-to-0.7 reports we've seen.) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers Key: CASSANDRA-2463 URL: https://issues.apache.org/jira/browse/CASSANDRA-2463 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: Any Reporter: C. Scott Andreas Labels: patch Fix For: 0.7.4 Attachments: 2463-v2.txt, patch.diff Original Estimate: 72h Remaining Estimate: 72h Currently, Cassandra 0.7.x allocates a 256MB contiguous byte array at the beginning of a memtable flush or compaction (presently hard-coded as Config.in_memory_compaction_limit_in_mb). When several memtable flushes are triggered at once (as by `nodetool flush` or `nodetool snapshot`), the tenured generation will typically experience extreme pressure as it attempts to locate [n] contiguous 256mb chunks of heap to allocate. This will often trigger a promotion failure, resulting in a stop-the-world GC until the allocation can be made. (Note that in the case of the release valve being triggered, the problem is even further exacerbated; the release valve will ironically trigger two contiguous 256MB allocations when attempting to flush the two largest memtables). This patch sets the buffer to be used by BufferedRandomAccessFile to Math.min(bytesToWrite, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE) rather than a hard-coded 256MB. The typical resulting buffer size is 64kb. I've taken some time to measure the impact of this change on the base 0.7.4 release and with this patch applied. This test involved launching Cassandra, performing four million writes across three column families from three clients, and monitoring heap usage and garbage collections. Cassandra was launched with 2GB of heap and the default JVM options shipped with the project. This configuration has 7 column families with a total of 15GB of data. Here's the base 0.7.4 release: http://cl.ly/413g2K06121z252e2t10 Note that on launch, we see a flush + compaction triggered almost immediately, resulting in at least 7x very quick 256MB allocations maxing out the heap, resulting in a promotion failure and a full GC. As flushes proceeed, we see that most of these have a corresponding CMS, consistent with the pattern of a large allocation and immediate collection. We see a second promotion failure and full GC at the 75% mark as the allocations cannot be satisfied without a collection, along with several CMSs in between. In the failure cases, the allocation requests occur so quickly that a standard CMS phase cannot completed before a ParNew attempts to promote the surviving byte array into the tenured generation. The heap usage and GC profile of this graph is very unhealthy. Here's the 0.7.4 release with this patch applied: http://cl.ly/050I1g26401B1X0w3s1f This graph is very different. At launch, rather than a immediate spike to full allocation and a promotion failure, we see a slow allocation slope reaching only 1/8th of total heap size. As writes begin, we see several flushes and compactions, but none result in immediate, large allocations. The ParNew collector keeps up with collections far more ably, resulting in only one healthy CMS collection with no promotion failure. Unlike the unhealthy rapid allocation and massive collection pattern we see in the first graph, this graph depicts a healthy sawtooth pattern of ParNews and an occasional effective CMS with no danger of heap fragmentation resulting in a promotion failure. The bottom line is that there's no need to allocate a hard-coded 256MB write buffer for flushing memtables and compactions to disk. Doing so results in unhealthy rapid allocation patterns and increases the probability of triggering promotion failures and full stop-the-world GCs which can cause nodes to become unresponsive and shunned from the ring during flushes and compactions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2463) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers
[ https://issues.apache.org/jira/browse/CASSANDRA-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019152#comment-13019152 ] Erik Onnen commented on CASSANDRA-2463: --- As a data point to that question, we hardly ever had CMS collections on 0.6.8 and maybe one full GC ever that I can think of for what was years of cumulative uptime. It surely differs for workloads, but in our case 0.7 got much worse along the CMS dimension. Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers Key: CASSANDRA-2463 URL: https://issues.apache.org/jira/browse/CASSANDRA-2463 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: Any Reporter: C. Scott Andreas Labels: patch Fix For: 0.7.4 Attachments: 2463-v2.txt, patch.diff Original Estimate: 72h Remaining Estimate: 72h Currently, Cassandra 0.7.x allocates a 256MB contiguous byte array at the beginning of a memtable flush or compaction (presently hard-coded as Config.in_memory_compaction_limit_in_mb). When several memtable flushes are triggered at once (as by `nodetool flush` or `nodetool snapshot`), the tenured generation will typically experience extreme pressure as it attempts to locate [n] contiguous 256mb chunks of heap to allocate. This will often trigger a promotion failure, resulting in a stop-the-world GC until the allocation can be made. (Note that in the case of the release valve being triggered, the problem is even further exacerbated; the release valve will ironically trigger two contiguous 256MB allocations when attempting to flush the two largest memtables). This patch sets the buffer to be used by BufferedRandomAccessFile to Math.min(bytesToWrite, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE) rather than a hard-coded 256MB. The typical resulting buffer size is 64kb. I've taken some time to measure the impact of this change on the base 0.7.4 release and with this patch applied. This test involved launching Cassandra, performing four million writes across three column families from three clients, and monitoring heap usage and garbage collections. Cassandra was launched with 2GB of heap and the default JVM options shipped with the project. This configuration has 7 column families with a total of 15GB of data. Here's the base 0.7.4 release: http://cl.ly/413g2K06121z252e2t10 Note that on launch, we see a flush + compaction triggered almost immediately, resulting in at least 7x very quick 256MB allocations maxing out the heap, resulting in a promotion failure and a full GC. As flushes proceeed, we see that most of these have a corresponding CMS, consistent with the pattern of a large allocation and immediate collection. We see a second promotion failure and full GC at the 75% mark as the allocations cannot be satisfied without a collection, along with several CMSs in between. In the failure cases, the allocation requests occur so quickly that a standard CMS phase cannot completed before a ParNew attempts to promote the surviving byte array into the tenured generation. The heap usage and GC profile of this graph is very unhealthy. Here's the 0.7.4 release with this patch applied: http://cl.ly/050I1g26401B1X0w3s1f This graph is very different. At launch, rather than a immediate spike to full allocation and a promotion failure, we see a slow allocation slope reaching only 1/8th of total heap size. As writes begin, we see several flushes and compactions, but none result in immediate, large allocations. The ParNew collector keeps up with collections far more ably, resulting in only one healthy CMS collection with no promotion failure. Unlike the unhealthy rapid allocation and massive collection pattern we see in the first graph, this graph depicts a healthy sawtooth pattern of ParNews and an occasional effective CMS with no danger of heap fragmentation resulting in a promotion failure. The bottom line is that there's no need to allocate a hard-coded 256MB write buffer for flushing memtables and compactions to disk. Doing so results in unhealthy rapid allocation patterns and increases the probability of triggering promotion failures and full stop-the-world GCs which can cause nodes to become unresponsive and shunned from the ring during flushes and compactions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2402) Python dbapi driver for CQL
[ https://issues.apache.org/jira/browse/CASSANDRA-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Hobbs updated CASSANDRA-2402: --- Attachment: 2402.txt Attached patch is a good first step towards dbapi support. Everything should be compliant with PEP 249, we just don't implement some optional features at this time. Python dbapi driver for CQL --- Key: CASSANDRA-2402 URL: https://issues.apache.org/jira/browse/CASSANDRA-2402 Project: Cassandra Issue Type: Task Reporter: Jon Hermes Assignee: Jon Hermes Fix For: 0.8 Attachments: 2402.txt Create a driver that emulates python's dbapi. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2465) Pig load/storefunc loads only one schema and BytesType validation class needs fix
Pig load/storefunc loads only one schema and BytesType validation class needs fix - Key: CASSANDRA-2465 URL: https://issues.apache.org/jira/browse/CASSANDRA-2465 Project: Cassandra Issue Type: Bug Reporter: Jeremy Hanna Assignee: Jeremy Hanna Fix For: 0.7.5 With a recent optimization, it appears that the Pig load/store func gets only one schema from Cassandra and tries to apply it to all CFs in the pig script. Also, the BytesType validation tries to cast the object in putNext as a DataByteArray and wrap it as a ByteBuffer. Instead it should just call objToBB which should take care of it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2465) Pig load/storefunc loads only one schema and BytesType validation class needs fix
[ https://issues.apache.org/jira/browse/CASSANDRA-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-2465: Attachment: 2465.txt Pig load/storefunc loads only one schema and BytesType validation class needs fix - Key: CASSANDRA-2465 URL: https://issues.apache.org/jira/browse/CASSANDRA-2465 Project: Cassandra Issue Type: Bug Reporter: Jeremy Hanna Assignee: Jeremy Hanna Labels: hadoop, pig Fix For: 0.7.5 Attachments: 2465.txt With a recent optimization, it appears that the Pig load/store func gets only one schema from Cassandra and tries to apply it to all CFs in the pig script. Also, the BytesType validation tries to cast the object in putNext as a DataByteArray and wrap it as a ByteBuffer. Instead it should just call objToBB which should take care of it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2465) Pig load/storefunc loads only one schema and BytesType validation class needs fix
[ https://issues.apache.org/jira/browse/CASSANDRA-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019164#comment-13019164 ] Jeremy Hanna commented on CASSANDRA-2465: - Attaching patch to make the udf context key specific to the keyspace and column family so it doesn't get overwritten. Also changed the putNext case where the validation class is BytesType to use objToBB the way it should to handle things like Strings. Pig load/storefunc loads only one schema and BytesType validation class needs fix - Key: CASSANDRA-2465 URL: https://issues.apache.org/jira/browse/CASSANDRA-2465 Project: Cassandra Issue Type: Bug Reporter: Jeremy Hanna Assignee: Jeremy Hanna Labels: hadoop, pig Fix For: 0.7.5 Attachments: 2465.txt With a recent optimization, it appears that the Pig load/store func gets only one schema from Cassandra and tries to apply it to all CFs in the pig script. Also, the BytesType validation tries to cast the object in putNext as a DataByteArray and wrap it as a ByteBuffer. Instead it should just call objToBB which should take care of it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2464) Distributed test for read repair
[ https://issues.apache.org/jira/browse/CASSANDRA-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-2464: Attachment: 0001-Distributed-test-for-read-repair.txt Adds a test for read repair... I haven't actually run it against trunk since I'm trying to figure out a Whirr issue, but it works in our branch (which is still using old Whirr). Distributed test for read repair Key: CASSANDRA-2464 URL: https://issues.apache.org/jira/browse/CASSANDRA-2464 Project: Cassandra Issue Type: Test Reporter: Stu Hood Assignee: Stu Hood Labels: test Fix For: 0.8 Attachments: 0001-Distributed-test-for-read-repair.txt See title. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2465) Pig load/storefunc loads only one schema and BytesType validation class needs fix
[ https://issues.apache.org/jira/browse/CASSANDRA-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019166#comment-13019166 ] Jeremy Hanna commented on CASSANDRA-2465: - Tested with basic row count pig script as well as multiple join/cogroup operations against multiple column families. Tested input and output. Pig load/storefunc loads only one schema and BytesType validation class needs fix - Key: CASSANDRA-2465 URL: https://issues.apache.org/jira/browse/CASSANDRA-2465 Project: Cassandra Issue Type: Bug Reporter: Jeremy Hanna Assignee: Jeremy Hanna Labels: hadoop, pig Fix For: 0.7.5 Attachments: 2465.txt With a recent optimization, it appears that the Pig load/store func gets only one schema from Cassandra and tries to apply it to all CFs in the pig script. Also, the BytesType validation tries to cast the object in putNext as a DataByteArray and wrap it as a ByteBuffer. Instead it should just call objToBB which should take care of it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up
[ https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019171#comment-13019171 ] Aaron Morton commented on CASSANDRA-2088: - Thanks will take another look at the cleanup for compaction. Temp files for failed compactions/streaming not cleaned up -- Key: CASSANDRA-2088 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088 Project: Cassandra Issue Type: Bug Components: Core Reporter: Stu Hood Assignee: Aaron Morton Fix For: 0.8 Attachments: 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 0002-delete-partial-sstable-if-compaction-error.patch From separate reports, compaction and repair are currently missing opportunities to clean up tmp files after failures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of LargeDataSetConsiderations by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The LargeDataSetConsiderations page has been changed by PeterSchuller. The comment on this change is: Reflect that CASSANDRA-1555 is fixed as of 0.7.1. http://wiki.apache.org/cassandra/LargeDataSetConsiderations?action=diffrev1=16rev2=17 -- * Was fixed/improved as of [[https://issues.apache.org/jira/browse/CASSANDRA-1878|CASSANDRA-1878]], for 0.6.9 and 0.7.0. * The operating system's page cache is affected by compaction and repair operations. If you are relying on the page cache to keep the active set in memory, you may see significant degradation on performance as a result of compaction and repair operations. * Potential future improvements: [[https://issues.apache.org/jira/browse/CASSANDRA-1470|CASSANDRA-1470]], [[https://issues.apache.org/jira/browse/CASSANDRA-1882|CASSANDRA-1882]]. - * If you have column families with more than 143 million row keys in them, bloom filter false positive rates are likely to go up because of implementation concerns that limit the maximum size of a bloom filter. See [[ArchitectureInternals]] for information on how bloom filters are used. The negative effects of hitting this limit is that reads will start taking additional seeks to disk as the row count increases. Note that the effect you are seeing at any given moment will depend on when compaction was last run, because the bloom filter limit is per-sstable. It is an issue for column families because after a major compaction, the entire column family will be in a single sstable. + * Prior to 0.7.1 (fixed in [[https://issues.apache.org/jira/browse/CASSANDRA-1555|CASSANDRA-1555]]), if you had column families with more than 143 million row keys in them, bloom filter false positive rates would be likely to go up because of implementation concerns that limited the maximum size of a bloom filter. See [[ArchitectureInternals]] for information on how bloom filters are used. The negative effects of hitting this limit is that reads will start taking additional seeks to disk as the row count increases. Note that the effect you are seeing at any given moment will depend on when compaction was last run, because the bloom filter limit is per-sstable. It is an issue for column families because after a major compaction, the entire column family will be in a single sstable. - * This will likely be addressed in the future: See [[https://issues.apache.org/jira/browse/CASSANDRA-1608|CASSANDRA-1608]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1555|CASSANDRA-1555]] * Compaction is currently not concurrent, so only a single compaction runs at a time. This means that sstable counts may spike during larger compactions as several smaller sstables are written while a large compaction is happening. This can cause additional seeks on reads. * Potential future improvements: [[https://issues.apache.org/jira/browse/CASSANDRA-1876|CASSANDRA-1876]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1881|CASSANDRA-1881]] * Consider the choice of file system. Removal of large files is notoriously slow and seek bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects background unlink():ing of sstables that happens every now and then, and also affects start-up time (if there are sstables pending removal when a node is starting up, they are removed as part of the start-up proceess; it may thus be detrimental if removing a terrabyte of sstables takes an hour (numbers are ballparks, not accurately measured and depends on circumstances)).
[Cassandra Wiki] Update of LargeDataSetConsiderations by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The LargeDataSetConsiderations page has been changed by PeterSchuller. The comment on this change is: Reflect that CASSANDRA-2191 may be addressing compaction concurrency for 0.8. http://wiki.apache.org/cassandra/LargeDataSetConsiderations?action=diffrev1=17rev2=18 -- * Prior to 0.7.1 (fixed in [[https://issues.apache.org/jira/browse/CASSANDRA-1555|CASSANDRA-1555]]), if you had column families with more than 143 million row keys in them, bloom filter false positive rates would be likely to go up because of implementation concerns that limited the maximum size of a bloom filter. See [[ArchitectureInternals]] for information on how bloom filters are used. The negative effects of hitting this limit is that reads will start taking additional seeks to disk as the row count increases. Note that the effect you are seeing at any given moment will depend on when compaction was last run, because the bloom filter limit is per-sstable. It is an issue for column families because after a major compaction, the entire column family will be in a single sstable. * Compaction is currently not concurrent, so only a single compaction runs at a time. This means that sstable counts may spike during larger compactions as several smaller sstables are written while a large compaction is happening. This can cause additional seeks on reads. * Potential future improvements: [[https://issues.apache.org/jira/browse/CASSANDRA-1876|CASSANDRA-1876]] and [[https://issues.apache.org/jira/browse/CASSANDRA-1881|CASSANDRA-1881]] + * Potentially already fixed for 0.8 (todo: go through ticket history and make sure what it implies): [[https://issues.apache.org/jira/browse/CASSANDRA-2191|CASSANDRA-2191]] * Consider the choice of file system. Removal of large files is notoriously slow and seek bound on e.g. ext2/ext3. Consider xfs or ext4fs. This affects background unlink():ing of sstables that happens every now and then, and also affects start-up time (if there are sstables pending removal when a node is starting up, they are removed as part of the start-up proceess; it may thus be detrimental if removing a terrabyte of sstables takes an hour (numbers are ballparks, not accurately measured and depends on circumstances)). * Adding nodes is a slow process if each node is responsible for a large amount of data. Plan for this; do not try to throw additional hardware at a cluster at the last minute. * Cassandra will read through sstable index files on start-up, doing what is known as index sampling. This is used to keep a subset (currently and by default, 1 out of 100) of keys and and their on-disk location in the index, in memory. See [[ArchitectureInternals]]. This means that the larger the index files are, the longer it takes to perform this sampling. Thus, for very large indexes (typically when you have a very large number of keys) the index sampling on start-up may be a significant issue.
[jira] [Commented] (CASSANDRA-2463) Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers
[ https://issues.apache.org/jira/browse/CASSANDRA-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019179#comment-13019179 ] Peter Schuller commented on CASSANDRA-2463: --- Filed CASSANDRA-2466 for the bloom filter case. Flush and Compaction Unnecessarily Allocate 256MB Contiguous Buffers Key: CASSANDRA-2463 URL: https://issues.apache.org/jira/browse/CASSANDRA-2463 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.7.4 Environment: Any Reporter: C. Scott Andreas Labels: patch Fix For: 0.7.4 Attachments: 2463-v2.txt, patch.diff Original Estimate: 72h Remaining Estimate: 72h Currently, Cassandra 0.7.x allocates a 256MB contiguous byte array at the beginning of a memtable flush or compaction (presently hard-coded as Config.in_memory_compaction_limit_in_mb). When several memtable flushes are triggered at once (as by `nodetool flush` or `nodetool snapshot`), the tenured generation will typically experience extreme pressure as it attempts to locate [n] contiguous 256mb chunks of heap to allocate. This will often trigger a promotion failure, resulting in a stop-the-world GC until the allocation can be made. (Note that in the case of the release valve being triggered, the problem is even further exacerbated; the release valve will ironically trigger two contiguous 256MB allocations when attempting to flush the two largest memtables). This patch sets the buffer to be used by BufferedRandomAccessFile to Math.min(bytesToWrite, BufferedRandomAccessFile.DEFAULT_BUFFER_SIZE) rather than a hard-coded 256MB. The typical resulting buffer size is 64kb. I've taken some time to measure the impact of this change on the base 0.7.4 release and with this patch applied. This test involved launching Cassandra, performing four million writes across three column families from three clients, and monitoring heap usage and garbage collections. Cassandra was launched with 2GB of heap and the default JVM options shipped with the project. This configuration has 7 column families with a total of 15GB of data. Here's the base 0.7.4 release: http://cl.ly/413g2K06121z252e2t10 Note that on launch, we see a flush + compaction triggered almost immediately, resulting in at least 7x very quick 256MB allocations maxing out the heap, resulting in a promotion failure and a full GC. As flushes proceeed, we see that most of these have a corresponding CMS, consistent with the pattern of a large allocation and immediate collection. We see a second promotion failure and full GC at the 75% mark as the allocations cannot be satisfied without a collection, along with several CMSs in between. In the failure cases, the allocation requests occur so quickly that a standard CMS phase cannot completed before a ParNew attempts to promote the surviving byte array into the tenured generation. The heap usage and GC profile of this graph is very unhealthy. Here's the 0.7.4 release with this patch applied: http://cl.ly/050I1g26401B1X0w3s1f This graph is very different. At launch, rather than a immediate spike to full allocation and a promotion failure, we see a slow allocation slope reaching only 1/8th of total heap size. As writes begin, we see several flushes and compactions, but none result in immediate, large allocations. The ParNew collector keeps up with collections far more ably, resulting in only one healthy CMS collection with no promotion failure. Unlike the unhealthy rapid allocation and massive collection pattern we see in the first graph, this graph depicts a healthy sawtooth pattern of ParNews and an occasional effective CMS with no danger of heap fragmentation resulting in a promotion failure. The bottom line is that there's no need to allocate a hard-coded 256MB write buffer for flushing memtables and compactions to disk. Doing so results in unhealthy rapid allocation patterns and increases the probability of triggering promotion failures and full stop-the-world GCs which can cause nodes to become unresponsive and shunned from the ring during flushes and compactions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2466) bloom filters should avoid huge array allocations to avoid fragmentation concerns
bloom filters should avoid huge array allocations to avoid fragmentation concerns - Key: CASSANDRA-2466 URL: https://issues.apache.org/jira/browse/CASSANDRA-2466 Project: Cassandra Issue Type: Bug Reporter: Peter Schuller Priority: Minor The fact that bloom filters are backed by single large arrays of longs is expected to interact badly with promotion of objects into old gen with CMS, due to fragmentation concerns (as discussed in CASSANDRA-2463). It should be less of an issue than CASSANDRA-2463 in the sense that you need to have a lot of rows before the array sizes become truly huge. For comparison, the ~ 143 million row key limit implied by the use of 'int' in BitSet prior to the switch to OpenBitSet translates roughly to 238 MB (assuming the limitation factor there was the addressability of the bits with a 32 bit int, which is my understanding). Having a preliminary look at OpenBitSet with an eye towards replacing the single long[] with multiple arrays, it seems that if we're willing to drop some of the functionality that is not used for bloom filter purposes, the bits[i] indexing should be pretty easy to augment with modulo to address an appropriate smaller array. Locality is not an issue since the bloom filter case is the worst possible case for locality anyway, and it doesn't matter whether it's one huge array or a number of ~ 64k arrays. Callers may be affected like BloomFilterSerializer which cares about the underlying bit array. If the full functionality of OpenBitSet is to be maintained (e.g., xorCount) some additional acrobatics would be necessary and presumably at a noticable performance cost if such operations were to be used in performance critical places. An argument against touching OpenBitSet is that it seems to be pretty carefully written and tested and has some non-trivial details and people have seemingly benchmarked it quite carefully. On the other hand, the improvement would then apply to other things as well, such as the bitsets used to keep track of in-core pages (off the cuff for scale, a 64 gig sstable should imply a 2 mb bit set, with one bit per 4k page). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2128) Corrupted Commit logs
[ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2128: -- Fix Version/s: 0.7.2 Corrupted Commit logs - Key: CASSANDRA-2128 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128 Project: Cassandra Issue Type: Bug Affects Versions: 0.6 Environment: cassandra-0.6 @ r1064246 (0.6.11) Ubuntu 9.10 Rackspace Cloud Reporter: Paul Querna Assignee: Jonathan Ellis Fix For: 0.6.12, 0.7.2 Attachments: 2128.txt Two of our nodes had a hard failure. They both came up with a corrupted commit log. On startup we get this: {quote} 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup. 2011-02-07_19:34:03.95403 java.io.EOFException 2011-02-07_19:34:03.95403 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) 2011-02-07_19:34:03.95404 at java.io.DataInputStream.readUTF(DataInputStream.java:572) 2011-02-07_19:34:03.95405 at java.io.DataInputStream.readUTF(DataInputStream.java:547) 2011-02-07_19:34:03.95406 at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363) 2011-02-07_19:34:03.95407 at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318) 2011-02-07_19:34:03.95408 at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240) 2011-02-07_19:34:03.95409 at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172) 2011-02-07_19:34:03.95409 at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115) 2011-02-07_19:34:03.95410 at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224) 2011-02-07_19:34:03.95422 Exception encountered during startup. 2011-02-07_19:34:03.95436 java.io.EOFException 2011-02-07_19:34:03.95447 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) 2011-02-07_19:34:03.95458 at java.io.DataInputStream.readUTF(DataInputStream.java:572) 2011-02-07_19:34:03.95468 at java.io.DataInputStream.readUTF(DataInputStream.java:547) 2011-02-07_19:34:03.95478 at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363) 2011-02-07_19:34:03.95489 at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318) 2011-02-07_19:34:03.95499 at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240) 2011-02-07_19:34:03.95510 at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172) 2011-02-07_19:34:03.95521 at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115) 2011-02-07_19:34:03.95531 at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224) {quote} On node A, the commit log in question is 100mb. On node B, the commit log in question is 60mb. An ideal resolution would be if EOF is hit early, log something, but don't stop the startup. Instead process everything that we have done so far, and keep going. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2457) Batch_mutate is broken for counters
[ https://issues.apache.org/jira/browse/CASSANDRA-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-2457: Reviewer: stuhood Batch_mutate is broken for counters --- Key: CASSANDRA-2457 URL: https://issues.apache.org/jira/browse/CASSANDRA-2457 Project: Cassandra Issue Type: Bug Affects Versions: 0.8 Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Fix For: 0.8 Original Estimate: 4h Remaining Estimate: 4h CASSANDRA-2384 allowed for batch_mutate to take counter and non counter operation, but the code was not updated correctly to handle that case. As it is, the code will use the first mutation in the batch list to decide whether to apply the write code path of counter or not, and will thus break if those are mixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira