Re: Schema versions reflect schemas on unwanted nodes
Not sure if anyone has seen this before but it's really killing me right now. Perhaps that was too long of a description of the issue so here's a more succinct question -- How do I remove nodes associated with a cluster that contain no data and have no reason to be associated with the cluster whatsoever? My last resort here is to stop cassandra (after recording all tokens for each node), set the initial token for each node in the cluster in cassandra.yaml, manually delete the LocationInfo* sstables in the system keyspace, and then restart. I'm hoping there's a simpler, less seemingly risky way to do this so please, please let me know if that's true! Thanks again. - Eric On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech e...@nextbigsound.com wrote: Hi, I'm having what I think is a fairly uncommon schema issue -- My situation is that I had a cluster with 10 nodes and a consistent schema. Then, in an experiment to setup a second cluster with the same information (by copying the raw sstables), I left the LocationInfo* sstables in the system keyspace in the new cluster and after starting the second cluster, I realized that the two clusters were discovering each other when they shouldn't have been. Since then, I changed the cluster name for the second cluster and made sure to delete the LocationInfo* sstables before starting it and the two clusters are now operating independent of one another for the most part. The only remaining connection between the two seems to be that the first cluster is still maintaining references to nodes in the second cluster in the schema versions despite those nodes not actually being part of the ring. Here's what my describe cluster looks like on the original cluster: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 48971cb0-e9ff-11e0--eb9eab7d90bf: [INTENTIONAL_IP1, INTENTIONAL_IP2, ..., INTENTIONAL_IP10] 848bcfc0-eddf-11e0--8a3bb58f08ff: [NOT_INTENTIONAL_IP1, NOT_INTENTIONAL_IP2] The second cluster, however, contains no schema versions involving nodes from the first cluster. My question then is, how can I remove those schema versions from the original cluster that are associated with the unwanted nodes from the second cluster? Is there any way to remove or evict an IP from a cluster instead of just a token? Thanks in advance! - Eric
Re: Schema versions reflect schemas on unwanted nodes
I don't think that's what I'm after here since the unwanted nodes were originally assimilated into the cluster with the same initial_token values as other nodes that were already in the cluster (that have, and still do have, useful data). I know this is an awkward situation so I'll try to depict it in a simpler way: Let's say I have a simplified version of our production cluster that looks like this - cass-1 token = A cass-2 token = B cass-3 token = C Then I tried to create a second cluster that looks like this - cass-analysis-1 token = A (and contains same data as cass-1) cass-analysis-2 token = B (and contains same data as cass-2) cass-analysis-3 token = C (and contains same data as cass-3) But after starting the second cluster, things got crossed up between the clusters and here's what the original cluster now looks like - cass-1 token = A (has data and schema) cass-2 token = B (has data and schema) cass-3 token = C (had data and schema) cass-analysis-1 token = A (has *no* data and is not part of the ring, but is trying to be included in cluster schema) A simplified version of describe cluster for the original cluster now shows: Cluster Information: Schema versions: SCHEMA-UUID-1: [cass-1, cass-2, cass-3] SCHEMA-UUID-2: [cass-analysis-1] But the simplified ring looks like this (has only 3 nodes instead of 4): Host Owns Token cass-1 33% A cass-2 33% B cass-3 33% C The original cluster is still working correctly but all live schema updates are failing because of the inconsistent schema versions introduced by the unwanted node. From my perspective, a simple fix seems to be for cassandra to exclude nodes that aren't part of the ring from the schema consistency requirements. Any reason that wouldn't work? And aside from a possible code patch, any recommendations as to how I can best fix this given the current 8.4 release? On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Does nodetool removetoken not work? On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech e...@nextbigsound.com wrote: Not sure if anyone has seen this before but it's really killing me right now. Perhaps that was too long of a description of the issue so here's a more succinct question -- How do I remove nodes associated with a cluster that contain no data and have no reason to be associated with the cluster whatsoever? My last resort here is to stop cassandra (after recording all tokens for each node), set the initial token for each node in the cluster in cassandra.yaml, manually delete the LocationInfo* sstables in the system keyspace, and then restart. I'm hoping there's a simpler, less seemingly risky way to do this so please, please let me know if that's true! Thanks again. - Eric On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech e...@nextbigsound.com wrote: Hi, I'm having what I think is a fairly uncommon schema issue -- My situation is that I had a cluster with 10 nodes and a consistent schema. Then, in an experiment to setup a second cluster with the same information (by copying the raw sstables), I left the LocationInfo* sstables in the system keyspace in the new cluster and after starting the second cluster, I realized that the two clusters were discovering each other when they shouldn't have been. Since then, I changed the cluster name for the second cluster and made sure to delete the LocationInfo* sstables before starting it and the two clusters are now operating independent of one another for the most part. The only remaining connection between the two seems to be that the first cluster is still maintaining references to nodes in the second cluster in the schema versions despite those nodes not actually being part of the ring. Here's what my describe cluster looks like on the original cluster: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 48971cb0-e9ff-11e0--eb9eab7d90bf: [INTENTIONAL_IP1, INTENTIONAL_IP2, ..., INTENTIONAL_IP10] 848bcfc0-eddf-11e0--8a3bb58f08ff: [NOT_INTENTIONAL_IP1, NOT_INTENTIONAL_IP2] The second cluster, however, contains no schema versions involving nodes from the first cluster. My question then is, how can I remove those schema versions from the original cluster that are associated with the unwanted nodes from the second cluster? Is there any way to remove or evict an IP from a cluster instead of just a token? Thanks in advance! - Eric -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Storing pre-sorted data
Hi Stephen, this is a great idea but unfortunately doesn't work for us either as we can not store the data in an unencrypted form. Kind regards Matthias On 10/12/2011 07:42 PM, Stephen Connolly wrote: could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 12 Oct 2011 17:57, Matthias Pfau p...@l3s.de mailto:p...@l3s.de wrote: Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form. On 10/12/2011 06:39 PM, David McNelis wrote: Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the strings you're sorting? If its viable to not convert to binary before writing to Cassandra, and you use one of the string based column ordering techniques (utf8, ascii, for example), then the data would be sorted without you needing to specifically worry about that. Of course, if the strings are lengthy you could run into additional issues. On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau p...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.de wrote: Hi there, we are currently building a prototype based on cassandra and came into problems on implementing sorted lists containing millions of items. The special thing about the items of our lists is, that cassandra is not able to sort them as the data is stored in a binary format which is not sortable. However, we are able to sort the data before the plain data gets encoded (our application is responsible for the order). First Approach: Storing Lists in ColumnFamilies *** We first tried to map the list to a single row of a ColumnFamily in a way that the index of the list is mapped to the column names and the items of the list to the column values. The column names are increasing numbers which define the sort order. This has the major drawback that big parts of the list have to be rewritten on inserts (because the column names are numbered by their index), which are quite common. Second Approach: Storing the whole List as Binary Data: *** We tried to store the compressed list in a single column. However, this is only feasible for smaller lists. Our lists are far to big leading to multi megabyte reads and writes. As we need to read and update the lists quite often, this would put our Cassandra cluster under a lot of pressure. Ideal Solution: Native support for storing lists *** We would be very happy with a way to store a list of sorted values without making improper use of column names for the list index. This implies that we would need a possibility to insert values at defined positions. We know that this could lead to problems with concurrent inserts in a distributed environment, but this is handled by our application logic. What are your ideas on that? Thanks Matthias -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com http://www.agentisenergy.com http://www.agentisenergy.com c: 219.384.5143 tel:219.384.5143 /A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource./
Re: Existing column(s) not readable
Hi Aaron, I guess i found it :-). I added logging for the used IndexInfo to SSTableNamesIterator.readIndexedColumns and got negative index postions for the missing columns. This is the reason why the columns are not loaded from sstable. So I had a look at ColumnIndexer.serializeInternal and there it is: int endPosition = 0, startPosition = -1; Should be: long endPosition = 0, startPosition = -1; I'm currently running a compaction with a fixed version to verify. Best, Thomas On 10/12/2011 11:54 PM, aaron morton wrote: Sounds a lot like the column is deleted. IIRC this is where the columns from various SSTables are reduced https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/db/filter/QueryFilter.java#L117 The call to ColumnFamily.addColumn() is where the column instance may be merged with other instances. A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/10/2011, at 5:33 AM, Thomas Richter wrote: Hi Aaron, I cannot read the column with a slice query. The slice query only returns data till a certain column and after that i only get empty results. I added log output to QueryFilter.isRelevant to see if the filter is dropping the column(s) but it doesn't even show up there. Next thing i will check check is the diff between columns contained in json export and columns fetched with the slice query, maybe this gives more clue... Any other ideas where to place more debugging output to see what's happening? Best, Thomas On 10/11/2011 12:46 PM, aaron morton wrote: kewl, * Row is not deleted (other columns can be read, row survives compaction with GCGraceSeconds=0) IIRC row tombstones can hang around for a while (until gc grace has passed), and they only have an effect on columns that have a lower timstamp. So it's possible to read columns from a row with a tombstone. Can you read the column using a slice range rather than specifying it's name ? Aaron - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11/10/2011, at 11:15 PM, Thomas Richter wrote: Hi Aaron, i invalidated the caches but nothing changed. I didn't get the mentioned log line either, but as I read the code SliceByNamesReadCommand uses NamesQueryFilter and not SliceQueryFilter. Next, there is only one SSTable. I can rule out that the row is deleted because I deleted all other rows in that CF to reduce data size and speed up testing. I set GCGraceSeconds to zero and ran a compaction. All other rows are gone, but i can still access at least one column from the left row. So as far as I understand it, there should not be a tombstone on row level. To make it a list: * One SSTable, one row * * Row is not deleted (other columns can be read, row survives compaction with GCGraceSeconds=0) * Most columns can be read by get['row']['col'] from cassandra-cli * Some columns can not be read by get['row']['col'] from cassandra-cli but can be found in output of sstable2json * unreadable data survives compaction with GCGraceSeconds=0 (checked with sstable2json) * Invalidation caches does not help * Nothing in the logs Does that point into any direction where i should look next? Best, Thomas On 10/11/2011 10:30 AM, aaron morton wrote: Nothing jumps out. The obvious answer is that the column has been deleted. Did you check all the SSTables ? It looks like query returned from row cache, otherwise you would see this as well… DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 1318294191654059:false:354@1318294191654861 Which would mean a version of the column was found. If you invalidate the cache with nodetool and run the query and the log message appears it will mean the column was read from (all of the) sstables. If you do not get a column returned I would say there is a tombstone in place. It's either a row level or a column level one. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11/10/2011, at 10:35 AM, Thomas Richter wrote: Hi Aaron, normally we use hector to access cassandra, but for debugging I switched to cassandra-cli. Column can not be read by a simple get CFName['rowkey']['colname']; Response is Value was not found if i query another column, everything is just fine. Serverlog for unsuccessful read (keyspace and CF names replaced): DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,739 CassandraServer.java (line 280) get DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,744 StorageProxy.java (line 320) Command/ConsistencyLevel is SliceByNamesReadCommand(table='Keyspace',
Re: Cassandra as session store under heavy load
durable_writes sounds great - thank you! I really do not need commit log here. Another question: it is possible to configure live time of Tombstones? Regards, Maciej
Re: Storing pre-sorted data
Matthias, This is an interesting problem. I would consider using long's as the column type, where your column names are evenly distributed longs in sort order when you first write your list out. So if you have items A and C with the long column names 1000 and 2000, and then you have to insert B, it gets inserted at 1500. Once you run out of room between any two column name entries, i.e 1000, 1001, 1002 entries are all taken at any spot in the list, go ahead and re-write the list. If your unencrypted data is uniformly distributed, you will have very few collisions on your column names and should not have to re-write the list to often. If your lists are small enough, then you could use ints to save space, but will then have to re-write the list more often. Thanks, Zach On Thu, Oct 13, 2011 at 2:47 AM, Matthias Pfau p...@l3s.de wrote: Hi Stephen, this is a great idea but unfortunately doesn't work for us either as we can not store the data in an unencrypted form. Kind regards Matthias On 10/12/2011 07:42 PM, Stephen Connolly wrote: could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 12 Oct 2011 17:57, Matthias Pfau p...@l3s.de mailto:p...@l3s.de wrote: Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form. On 10/12/2011 06:39 PM, David McNelis wrote: Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the strings you're sorting? If its viable to not convert to binary before writing to Cassandra, and you use one of the string based column ordering techniques (utf8, ascii, for example), then the data would be sorted without you needing to specifically worry about that. Of course, if the strings are lengthy you could run into additional issues. On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau p...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.de wrote: Hi there, we are currently building a prototype based on cassandra and came into problems on implementing sorted lists containing millions of items. The special thing about the items of our lists is, that cassandra is not able to sort them as the data is stored in a binary format which is not sortable. However, we are able to sort the data before the plain data gets encoded (our application is responsible for the order). First Approach: Storing Lists in ColumnFamilies *** We first tried to map the list to a single row of a ColumnFamily in a way that the index of the list is mapped to the column names and the items of the list to the column values. The column names are increasing numbers which define the sort order. This has the major drawback that big parts of the list have to be rewritten on inserts (because the column names are numbered by their index), which are quite common. Second Approach: Storing the whole List as Binary Data: *** We tried to store the compressed list in a single column. However, this is only feasible for smaller lists. Our lists are far to big leading to multi megabyte reads and writes. As we need to read and update the lists quite often, this would put our Cassandra cluster under a lot of pressure. Ideal Solution: Native support for storing lists *** We would be very happy with a way to store a list of sorted values without making improper use of column names for the list index. This implies that we would need a possibility to insert values at defined positions. We know that this could lead to problems with concurrent inserts in a distributed environment, but this is handled by our application logic. What are your ideas on that? Thanks Matthias -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com http://www.agentisenergy.com http://www.agentisenergy.com c: 219.384.5143 tel:219.384.5143 /A Smart Grid technology company focused on helping consumers of energy control an often
Re: [Solved] column index offset miscalculation (was: Existing column(s) not readable)
JIRA is not read-only, you should be able to create a ticket at https://issues.apache.org/jira/browse/CASSANDRA, though that probably require that you create an account. -- Sylvain On Thu, Oct 13, 2011 at 3:20 PM, Thomas Richter t...@tricnet.de wrote: Hi Aaron, the fix does the trick. I wonder why nobody else ran into this before... I checked org/apache/cassandra/db/ColumnIndexer.java in 0.7.9, 0.8.7 and 1.0.0-rc2 and all seem to be affected. Looks like public Jira is readonly - so I'm not sure about how to continue. Best, Thomas On 10/13/2011 10:52 AM, Thomas Richter wrote: Hi Aaron, I guess i found it :-). I added logging for the used IndexInfo to SSTableNamesIterator.readIndexedColumns and got negative index postions for the missing columns. This is the reason why the columns are not loaded from sstable. So I had a look at ColumnIndexer.serializeInternal and there it is: int endPosition = 0, startPosition = -1; Should be: long endPosition = 0, startPosition = -1; I'm currently running a compaction with a fixed version to verify. Best, Thomas On 10/12/2011 11:54 PM, aaron morton wrote: Sounds a lot like the column is deleted. IIRC this is where the columns from various SSTables are reduced https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/db/filter/QueryFilter.java#L117 The call to ColumnFamily.addColumn() is where the column instance may be merged with other instances. A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/10/2011, at 5:33 AM, Thomas Richter wrote: Hi Aaron, I cannot read the column with a slice query. The slice query only returns data till a certain column and after that i only get empty results. I added log output to QueryFilter.isRelevant to see if the filter is dropping the column(s) but it doesn't even show up there. Next thing i will check check is the diff between columns contained in json export and columns fetched with the slice query, maybe this gives more clue... Any other ideas where to place more debugging output to see what's happening? Best, Thomas On 10/11/2011 12:46 PM, aaron morton wrote: kewl, * Row is not deleted (other columns can be read, row survives compaction with GCGraceSeconds=0) IIRC row tombstones can hang around for a while (until gc grace has passed), and they only have an effect on columns that have a lower timstamp. So it's possible to read columns from a row with a tombstone. Can you read the column using a slice range rather than specifying it's name ? Aaron - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11/10/2011, at 11:15 PM, Thomas Richter wrote: Hi Aaron, i invalidated the caches but nothing changed. I didn't get the mentioned log line either, but as I read the code SliceByNamesReadCommand uses NamesQueryFilter and not SliceQueryFilter. Next, there is only one SSTable. I can rule out that the row is deleted because I deleted all other rows in that CF to reduce data size and speed up testing. I set GCGraceSeconds to zero and ran a compaction. All other rows are gone, but i can still access at least one column from the left row. So as far as I understand it, there should not be a tombstone on row level. To make it a list: * One SSTable, one row * * Row is not deleted (other columns can be read, row survives compaction with GCGraceSeconds=0) * Most columns can be read by get['row']['col'] from cassandra-cli * Some columns can not be read by get['row']['col'] from cassandra-cli but can be found in output of sstable2json * unreadable data survives compaction with GCGraceSeconds=0 (checked with sstable2json) * Invalidation caches does not help * Nothing in the logs Does that point into any direction where i should look next? Best, Thomas On 10/11/2011 10:30 AM, aaron morton wrote: Nothing jumps out. The obvious answer is that the column has been deleted. Did you check all the SSTables ? It looks like query returned from row cache, otherwise you would see this as well… DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 1318294191654059:false:354@1318294191654861 Which would mean a version of the column was found. If you invalidate the cache with nodetool and run the query and the log message appears it will mean the column was read from (all of the) sstables. If you do not get a column returned I would say there is a tombstone in place. It's either a row level or a column level one. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11/10/2011, at 10:35 AM, Thomas Richter wrote: Hi Aaron, normally we
Re: supercolumns vs. prefixing columns of same data type?
Hi Dean, I don't have have an answer to your question, but just in case you haven't seen this screencast by Ed Anuff on Cassandra Indexes, it helped me a lot. http://blip.tv/datastax/indexing-in-cassandra-5495633 Hani On Wed, Oct 12, 2011 at 12:18 PM, Dean Hiller d...@alvazan.com wrote: I heard cassandra may be going the direction of removing super column and users are starting to just use prefixes in front of the column. The reason I ask is I was going the way of only using supercolumns and then many tables were fixed with just one supercolumn per row as the structure for that table was simplethis kept the api we have on top of Hector extremely simple not having to deal with columns vs. supercolumns. What are people's thoughts on this? Dealing in columnfamilies where some have supercolumns and some don't I think personally is a painful way to go.going with just one way and sticking with it sure makes the apis easier and it's much easier to apply AOP type stuff to that ONE insert method rather than having two insert methods. So what is the direction of casssandra project and the recommendation? thanks, Dean
Re: Schema versions reflect schemas on unwanted nodes
Do you have same seed node specified in cass-analysis-1 as cass-1,2,3? I am thinking that changing the seed node in cass-analysis-2 and following the directions in http://wiki.apache.org/cassandra/FAQ#schema_disagreement might solve the problem. Somone please correct me. On Thu, Oct 13, 2011 at 12:05 AM, Eric Czech e...@nextbigsound.com wrote: I don't think that's what I'm after here since the unwanted nodes were originally assimilated into the cluster with the same initial_token values as other nodes that were already in the cluster (that have, and still do have, useful data). I know this is an awkward situation so I'll try to depict it in a simpler way: Let's say I have a simplified version of our production cluster that looks like this - cass-1 token = A cass-2 token = B cass-3 token = C Then I tried to create a second cluster that looks like this - cass-analysis-1 token = A (and contains same data as cass-1) cass-analysis-2 token = B (and contains same data as cass-2) cass-analysis-3 token = C (and contains same data as cass-3) But after starting the second cluster, things got crossed up between the clusters and here's what the original cluster now looks like - cass-1 token = A (has data and schema) cass-2 token = B (has data and schema) cass-3 token = C (had data and schema) cass-analysis-1 token = A (has *no* data and is not part of the ring, but is trying to be included in cluster schema) A simplified version of describe cluster for the original cluster now shows: Cluster Information: Schema versions: SCHEMA-UUID-1: [cass-1, cass-2, cass-3] SCHEMA-UUID-2: [cass-analysis-1] But the simplified ring looks like this (has only 3 nodes instead of 4): Host Owns Token cass-1 33% A cass-2 33% B cass-3 33% C The original cluster is still working correctly but all live schema updates are failing because of the inconsistent schema versions introduced by the unwanted node. From my perspective, a simple fix seems to be for cassandra to exclude nodes that aren't part of the ring from the schema consistency requirements. Any reason that wouldn't work? And aside from a possible code patch, any recommendations as to how I can best fix this given the current 8.4 release? On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Does nodetool removetoken not work? On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech e...@nextbigsound.com wrote: Not sure if anyone has seen this before but it's really killing me right now. Perhaps that was too long of a description of the issue so here's a more succinct question -- How do I remove nodes associated with a cluster that contain no data and have no reason to be associated with the cluster whatsoever? My last resort here is to stop cassandra (after recording all tokens for each node), set the initial token for each node in the cluster in cassandra.yaml, manually delete the LocationInfo* sstables in the system keyspace, and then restart. I'm hoping there's a simpler, less seemingly risky way to do this so please, please let me know if that's true! Thanks again. - Eric On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech e...@nextbigsound.com wrote: Hi, I'm having what I think is a fairly uncommon schema issue -- My situation is that I had a cluster with 10 nodes and a consistent schema. Then, in an experiment to setup a second cluster with the same information (by copying the raw sstables), I left the LocationInfo* sstables in the system keyspace in the new cluster and after starting the second cluster, I realized that the two clusters were discovering each other when they shouldn't have been. Since then, I changed the cluster name for the second cluster and made sure to delete the LocationInfo* sstables before starting it and the two clusters are now operating independent of one another for the most part. The only remaining connection between the two seems to be that the first cluster is still maintaining references to nodes in the second cluster in the schema versions despite those nodes not actually being part of the ring. Here's what my describe cluster looks like on the original cluster: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 48971cb0-e9ff-11e0--eb9eab7d90bf: [INTENTIONAL_IP1, INTENTIONAL_IP2, ..., INTENTIONAL_IP10] 848bcfc0-eddf-11e0--8a3bb58f08ff: [NOT_INTENTIONAL_IP1, NOT_INTENTIONAL_IP2] The second cluster, however, contains no schema versions involving nodes from the first cluster. My question then is, how can I remove those schema versions from the original cluster that are associated with the unwanted nodes from the second cluster? Is there any way to remove or evict an IP from a cluster instead of just a
Re: Storing pre-sorted data
Hi Zach, thanks for that good idea. Unfortunately, our list needs to be rewritten often because our data is far away from being evenly distributed. However, we could get this under control but there is a more severe problem: Random access is very hard to implement on a structure with undefined distances between two following index numbers. We absolutely need random access because the lists are too big to do this on the application side :-( Kind regards Matthias On 10/13/2011 02:30 PM, Zach Richardson wrote: Matthias, This is an interesting problem. I would consider using long's as the column type, where your column names are evenly distributed longs in sort order when you first write your list out. So if you have items A and C with the long column names 1000 and 2000, and then you have to insert B, it gets inserted at 1500. Once you run out of room between any two column name entries, i.e 1000, 1001, 1002 entries are all taken at any spot in the list, go ahead and re-write the list. If your unencrypted data is uniformly distributed, you will have very few collisions on your column names and should not have to re-write the list to often. If your lists are small enough, then you could use ints to save space, but will then have to re-write the list more often. Thanks, Zach On Thu, Oct 13, 2011 at 2:47 AM, Matthias Pfaup...@l3s.de wrote: Hi Stephen, this is a great idea but unfortunately doesn't work for us either as we can not store the data in an unencrypted form. Kind regards Matthias On 10/12/2011 07:42 PM, Stephen Connolly wrote: could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 12 Oct 2011 17:57, Matthias Pfaup...@l3s.demailto:p...@l3s.de wrote: Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form. On 10/12/2011 06:39 PM, David McNelis wrote: Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the strings you're sorting? If its viable to not convert to binary before writing to Cassandra, and you use one of the string based column ordering techniques (utf8, ascii, for example), then the data would be sorted without you needing to specifically worry about that. Of course, if the strings are lengthy you could run into additional issues. On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfaup...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.demailto:p...@l3s.de wrote: Hi there, we are currently building a prototype based on cassandra and came into problems on implementing sorted lists containing millions of items. The special thing about the items of our lists is, that cassandra is not able to sort them as the data is stored in a binary format which is not sortable. However, we are able to sort the data before the plain data gets encoded (our application is responsible for the order). First Approach: Storing Lists in ColumnFamilies *** We first tried to map the list to a single row of a ColumnFamily in a way that the index of the list is mapped to the column names and the items of the list to the column values. The column names are increasing numbers which define the sort order. This has the major drawback that big parts of the list have to be rewritten on inserts (because the column names are numbered by their index), which are quite common. Second Approach: Storing the whole List as Binary Data: *** We tried to store the compressed list in a single column. However, this is only feasible for smaller lists. Our lists are far to big leading to multi megabyte reads and writes. As we need to read and update the lists quite often, this would put our Cassandra cluster under a lot of pressure. Ideal Solution: Native support for storing lists *** We would be very happy with a way to store a list of sorted values without making improper use of column names for the list index. This implies that we would need a possibility to insert values at defined positions. We know that this could lead to problems with concurrent inserts in a distributed
Re: Schema versions reflect schemas on unwanted nodes
Nope, there was definitely no intersection of the seed nodes between the two clusters so I'm fairly certain that the second cluster found out about the first through what was in the LocationInfo* system tables. Also, I don't think that procedure will really help because I don't actually want the schema on cass-analysis-1 to be consistent with the schema in the original cluster -- I just want to totally remove it. On Thu, Oct 13, 2011 at 8:01 AM, Mohit Anchlia mohitanch...@gmail.comwrote: Do you have same seed node specified in cass-analysis-1 as cass-1,2,3? I am thinking that changing the seed node in cass-analysis-2 and following the directions in http://wiki.apache.org/cassandra/FAQ#schema_disagreement might solve the problem. Somone please correct me. On Thu, Oct 13, 2011 at 12:05 AM, Eric Czech e...@nextbigsound.com wrote: I don't think that's what I'm after here since the unwanted nodes were originally assimilated into the cluster with the same initial_token values as other nodes that were already in the cluster (that have, and still do have, useful data). I know this is an awkward situation so I'll try to depict it in a simpler way: Let's say I have a simplified version of our production cluster that looks like this - cass-1 token = A cass-2 token = B cass-3 token = C Then I tried to create a second cluster that looks like this - cass-analysis-1 token = A (and contains same data as cass-1) cass-analysis-2 token = B (and contains same data as cass-2) cass-analysis-3 token = C (and contains same data as cass-3) But after starting the second cluster, things got crossed up between the clusters and here's what the original cluster now looks like - cass-1 token = A (has data and schema) cass-2 token = B (has data and schema) cass-3 token = C (had data and schema) cass-analysis-1 token = A (has *no* data and is not part of the ring, but is trying to be included in cluster schema) A simplified version of describe cluster for the original cluster now shows: Cluster Information: Schema versions: SCHEMA-UUID-1: [cass-1, cass-2, cass-3] SCHEMA-UUID-2: [cass-analysis-1] But the simplified ring looks like this (has only 3 nodes instead of 4): Host Owns Token cass-1 33% A cass-2 33% B cass-3 33% C The original cluster is still working correctly but all live schema updates are failing because of the inconsistent schema versions introduced by the unwanted node. From my perspective, a simple fix seems to be for cassandra to exclude nodes that aren't part of the ring from the schema consistency requirements. Any reason that wouldn't work? And aside from a possible code patch, any recommendations as to how I can best fix this given the current 8.4 release? On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Does nodetool removetoken not work? On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech e...@nextbigsound.com wrote: Not sure if anyone has seen this before but it's really killing me right now. Perhaps that was too long of a description of the issue so here's a more succinct question -- How do I remove nodes associated with a cluster that contain no data and have no reason to be associated with the cluster whatsoever? My last resort here is to stop cassandra (after recording all tokens for each node), set the initial token for each node in the cluster in cassandra.yaml, manually delete the LocationInfo* sstables in the system keyspace, and then restart. I'm hoping there's a simpler, less seemingly risky way to do this so please, please let me know if that's true! Thanks again. - Eric On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech e...@nextbigsound.com wrote: Hi, I'm having what I think is a fairly uncommon schema issue -- My situation is that I had a cluster with 10 nodes and a consistent schema. Then, in an experiment to setup a second cluster with the same information (by copying the raw sstables), I left the LocationInfo* sstables in the system keyspace in the new cluster and after starting the second cluster, I realized that the two clusters were discovering each other when they shouldn't have been. Since then, I changed the cluster name for the second cluster and made sure to delete the LocationInfo* sstables before starting it and the two clusters are now operating independent of one another for the most part. The only remaining connection between the two seems to be that the first cluster is still maintaining references to nodes in the second cluster in the schema versions despite those nodes not actually being part of the ring. Here's what my describe cluster looks like on the original cluster: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch
Re: Schema versions reflect schemas on unwanted nodes
You're running into https://issues.apache.org/jira/browse/CASSANDRA-3259 Try upgrading and doing a rolling restart. -Brandon On Thu, Oct 13, 2011 at 9:11 AM, Eric Czech e...@nextbigsound.com wrote: Nope, there was definitely no intersection of the seed nodes between the two clusters so I'm fairly certain that the second cluster found out about the first through what was in the LocationInfo* system tables. Also, I don't think that procedure will really help because I don't actually want the schema on cass-analysis-1 to be consistent with the schema in the original cluster -- I just want to totally remove it. On Thu, Oct 13, 2011 at 8:01 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Do you have same seed node specified in cass-analysis-1 as cass-1,2,3? I am thinking that changing the seed node in cass-analysis-2 and following the directions in http://wiki.apache.org/cassandra/FAQ#schema_disagreement might solve the problem. Somone please correct me. On Thu, Oct 13, 2011 at 12:05 AM, Eric Czech e...@nextbigsound.com wrote: I don't think that's what I'm after here since the unwanted nodes were originally assimilated into the cluster with the same initial_token values as other nodes that were already in the cluster (that have, and still do have, useful data). I know this is an awkward situation so I'll try to depict it in a simpler way: Let's say I have a simplified version of our production cluster that looks like this - cass-1 token = A cass-2 token = B cass-3 token = C Then I tried to create a second cluster that looks like this - cass-analysis-1 token = A (and contains same data as cass-1) cass-analysis-2 token = B (and contains same data as cass-2) cass-analysis-3 token = C (and contains same data as cass-3) But after starting the second cluster, things got crossed up between the clusters and here's what the original cluster now looks like - cass-1 token = A (has data and schema) cass-2 token = B (has data and schema) cass-3 token = C (had data and schema) cass-analysis-1 token = A (has *no* data and is not part of the ring, but is trying to be included in cluster schema) A simplified version of describe cluster for the original cluster now shows: Cluster Information: Schema versions: SCHEMA-UUID-1: [cass-1, cass-2, cass-3] SCHEMA-UUID-2: [cass-analysis-1] But the simplified ring looks like this (has only 3 nodes instead of 4): Host Owns Token cass-1 33% A cass-2 33% B cass-3 33% C The original cluster is still working correctly but all live schema updates are failing because of the inconsistent schema versions introduced by the unwanted node. From my perspective, a simple fix seems to be for cassandra to exclude nodes that aren't part of the ring from the schema consistency requirements. Any reason that wouldn't work? And aside from a possible code patch, any recommendations as to how I can best fix this given the current 8.4 release? On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Does nodetool removetoken not work? On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech e...@nextbigsound.com wrote: Not sure if anyone has seen this before but it's really killing me right now. Perhaps that was too long of a description of the issue so here's a more succinct question -- How do I remove nodes associated with a cluster that contain no data and have no reason to be associated with the cluster whatsoever? My last resort here is to stop cassandra (after recording all tokens for each node), set the initial token for each node in the cluster in cassandra.yaml, manually delete the LocationInfo* sstables in the system keyspace, and then restart. I'm hoping there's a simpler, less seemingly risky way to do this so please, please let me know if that's true! Thanks again. - Eric On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech e...@nextbigsound.com wrote: Hi, I'm having what I think is a fairly uncommon schema issue -- My situation is that I had a cluster with 10 nodes and a consistent schema. Then, in an experiment to setup a second cluster with the same information (by copying the raw sstables), I left the LocationInfo* sstables in the system keyspace in the new cluster and after starting the second cluster, I realized that the two clusters were discovering each other when they shouldn't have been. Since then, I changed the cluster name for the second cluster and made sure to delete the LocationInfo* sstables before starting it and the two clusters are now operating independent of one another for the most part. The only remaining connection between the two seems to be that the first cluster is still maintaining references to nodes in the second
Re: [Solved] column index offset miscalculation
Thanks for the hint. Ticket created: https://issues.apache.org/jira/browse/CASSANDRA-3358 Best, Thomas On 10/13/2011 03:27 PM, Sylvain Lebresne wrote: JIRA is not read-only, you should be able to create a ticket at https://issues.apache.org/jira/browse/CASSANDRA, though that probably require that you create an account. -- Sylvain On Thu, Oct 13, 2011 at 3:20 PM, Thomas Richter t...@tricnet.de wrote: Hi Aaron, the fix does the trick. I wonder why nobody else ran into this before... I checked org/apache/cassandra/db/ColumnIndexer.java in 0.7.9, 0.8.7 and 1.0.0-rc2 and all seem to be affected. Looks like public Jira is readonly - so I'm not sure about how to continue. Best, Thomas On 10/13/2011 10:52 AM, Thomas Richter wrote: Hi Aaron, I guess i found it :-). I added logging for the used IndexInfo to SSTableNamesIterator.readIndexedColumns and got negative index postions for the missing columns. This is the reason why the columns are not loaded from sstable. So I had a look at ColumnIndexer.serializeInternal and there it is: int endPosition = 0, startPosition = -1; Should be: long endPosition = 0, startPosition = -1; I'm currently running a compaction with a fixed version to verify. Best, Thomas On 10/12/2011 11:54 PM, aaron morton wrote: Sounds a lot like the column is deleted. IIRC this is where the columns from various SSTables are reduced https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/db/filter/QueryFilter.java#L117 The call to ColumnFamily.addColumn() is where the column instance may be merged with other instances. A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/10/2011, at 5:33 AM, Thomas Richter wrote: Hi Aaron, I cannot read the column with a slice query. The slice query only returns data till a certain column and after that i only get empty results. I added log output to QueryFilter.isRelevant to see if the filter is dropping the column(s) but it doesn't even show up there. Next thing i will check check is the diff between columns contained in json export and columns fetched with the slice query, maybe this gives more clue... Any other ideas where to place more debugging output to see what's happening? Best, Thomas On 10/11/2011 12:46 PM, aaron morton wrote: kewl, * Row is not deleted (other columns can be read, row survives compaction with GCGraceSeconds=0) IIRC row tombstones can hang around for a while (until gc grace has passed), and they only have an effect on columns that have a lower timstamp. So it's possible to read columns from a row with a tombstone. Can you read the column using a slice range rather than specifying it's name ? Aaron - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11/10/2011, at 11:15 PM, Thomas Richter wrote: Hi Aaron, i invalidated the caches but nothing changed. I didn't get the mentioned log line either, but as I read the code SliceByNamesReadCommand uses NamesQueryFilter and not SliceQueryFilter. Next, there is only one SSTable. I can rule out that the row is deleted because I deleted all other rows in that CF to reduce data size and speed up testing. I set GCGraceSeconds to zero and ran a compaction. All other rows are gone, but i can still access at least one column from the left row. So as far as I understand it, there should not be a tombstone on row level. To make it a list: * One SSTable, one row * * Row is not deleted (other columns can be read, row survives compaction with GCGraceSeconds=0) * Most columns can be read by get['row']['col'] from cassandra-cli * Some columns can not be read by get['row']['col'] from cassandra-cli but can be found in output of sstable2json * unreadable data survives compaction with GCGraceSeconds=0 (checked with sstable2json) * Invalidation caches does not help * Nothing in the logs Does that point into any direction where i should look next? Best, Thomas On 10/11/2011 10:30 AM, aaron morton wrote: Nothing jumps out. The obvious answer is that the column has been deleted. Did you check all the SSTables ? It looks like query returned from row cache, otherwise you would see this as well… DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 1318294191654059:false:354@1318294191654861 Which would mean a version of the column was found. If you invalidate the cache with nodetool and run the query and the log message appears it will mean the column was read from (all of the) sstables. If you do not get a column returned I would say there is a tombstone in place. It's either a row level or a column level one. Hope that helps. -
Re: supercolumns vs. prefixing columns of same data type?
great video, thanks! On Thu, Oct 13, 2011 at 7:45 AM, hani elabed hani.ela...@gmail.com wrote: Hi Dean, I don't have have an answer to your question, but just in case you haven't seen this screencast by Ed Anuff on Cassandra Indexes, it helped me a lot. http://blip.tv/datastax/indexing-in-cassandra-5495633 Hani On Wed, Oct 12, 2011 at 12:18 PM, Dean Hiller d...@alvazan.com wrote: I heard cassandra may be going the direction of removing super column and users are starting to just use prefixes in front of the column. The reason I ask is I was going the way of only using supercolumns and then many tables were fixed with just one supercolumn per row as the structure for that table was simplethis kept the api we have on top of Hector extremely simple not having to deal with columns vs. supercolumns. What are people's thoughts on this? Dealing in columnfamilies where some have supercolumns and some don't I think personally is a painful way to go.going with just one way and sticking with it sure makes the apis easier and it's much easier to apply AOP type stuff to that ONE insert method rather than having two insert methods. So what is the direction of casssandra project and the recommendation? thanks, Dean
Re: Hector Problem Basic one
Hi, Hector does not retry on a down server. In the unit tests where you have just one server, Hector will pass the exception to the client. Can you tell us please what your test looks like ? 2011/10/12 Wangpei (Peter) peter.wang...@huawei.com I only saw this error message when all Cassandra nodes are down. How you get the Cluster and how you set the hosts? ** ** *发件人:* CASSANDRA learner [mailto:cassandralear...@gmail.com] *发送时间:* 2011年10月12日 14:30 *收件人:* user@cassandra.apache.org *主题:* Re: Hector Problem Basic one ** ** Thanks for the reply ben. Actually The problem is, I could not able to run a basic hector example from eclipse. Its throwing me.prettyprint.hector.api. exceptions.HectorException: All host pools marked down. Retry burden pushed out to client Can you please let me know why i am getting this On Tue, Oct 11, 2011 at 3:54 PM, Ben Ashton b...@bossastudios.com wrote:* *** Hey, We had this one, even tho in the hector documentation it says that it retry s failed servers even 30 by default, it doesn't. Once we explicitly set it to X seconds, when ever there is a failure, ie with network (AWS), it will retry and add it back into the pool. Ben On 11 October 2011 11:09, CASSANDRA learner cassandralear...@gmail.com wrote: Hi Every One, Actually I was using cassandra long time back and when i tried today, I am getting a problem from eclipse. When i am trying to run a basic hector (java) example, I am getting an exception me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client. . But My server is up. Node tool also whows that it is up. I donno what happens.. 1.)Is it any thing to do with JMX port. 2.) What is the storage port in casandra.yaml and jmx port in cassandra-env.sh ** **
RE: MapReduce with two ethernet cards
I upgraded to cassandra 0.8.7, and the problem persists. Scott From: Brandon Williams [dri...@gmail.com] Sent: Monday, October 10, 2011 12:28 PM To: user@cassandra.apache.org Subject: Re: MapReduce with two ethernet cards On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines scott.fi...@nisc.coop wrote: Hi all, This may be a silly question, but I'm at a bit of a loss, and was hoping for some help. I have a Cassandra cluster set up with two NICs--one for internel communication between cassandra machines (10.1.1.*), and one to respond to Thrift RPC (172.28.*.*). I also have a Hadoop cluster set up, which, for unrelated reasons, has to remain separate from Cassandra, so I've written a little MapReduce job to copy data from Cassandra to Hadoop. However, when I try to run my job, I get java.io.IOException: failed connecting to all endpoints 10.1.1.24,10.1.1.17,10.1.1.16 which is puzzling to me. It seems like the MR is attempting to connect to the internal communication IPs instead of the external Thrift IPs. Since I set up a firewall to block external access to the internal IPs of Cassandra, this is obviously going to fail. So my question is: why does Cassandra MR seem to be grabbing the listen_address instead of the Thrift one. Presuming it's not a funky configuration error or something on my part, is that strictly necessary? All told, I'd prefer if it was connecting to the Thrift IPs, but if it can't, should I open up port 7000 or port 9160 between Hadoop and Cassandra? Thanks for your help, Scott Your cassandra is old, upgrade to the latest version. -Brandon
Re: Efficiency of hector's setRowCount
Hi Don. No it will not. IndexedSlicesQuery will read just the amount of rows specified by RowCount and will go to the DB to get the new page when needed. SetRowCount is doing indexClause.setCount(rowCount); On Mon, Oct 10, 2011 at 3:52 PM, Don Smith dsm...@likewise.com wrote: Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/** hector/wiki/User-Guide https://github.com/rantav/hector/wiki/User-Guide. rangeSlicesQuery.setRowCount(**1001); . rangeSlicesQuery.setKeys(**lastRow.getKey(), ); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don
Re: Efficiency of hector's setRowCount (and setStartKey!)
It's actually setStartKey that's the important method call (in combination with setRowCount). So I should have been clearer. The following code performs as expected, as far as returning the expected data in the expected order. I believe that the use of IndexedSliceQuery's setStartKey will support efficient queries -- avoiding repulling the entire data set from cassandra. Correct? void demoPaging() { String lastKey = processPage(don,); // get first batch, starting with (smallest key) lastKey = processPage(don,lastKey);// get second batch starting with previous last key lastKey = processPage(don,lastKey);// get third batch starting with previous last key // } // return last key processed, null when no records left String processPage(String username, String startKey) { String lastKey=null; IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); indexedSlicesQuery.addEqualsExpression(user, username); indexedSlicesQuery.setColumnNames(source,ip); indexedSlicesQuery.setColumnFamily(ourColumnFamilyName); indexedSlicesQuery.setStartKey(startKey); // indexedSlicesQuery.setRowCount(batchSize); QueryResultOrderedRowsString, String, String result =indexedSlicesQuery.execute(); OrderedRowsString,String,String rows = result.get(); for(RowString,String,String row:rows ){ if (row==null) { continue; } totalCount++; String key = row.getKey(); if (!startKey.equals(key)) {lastKey=key;} } totalCount--; return lastKey; } On 10/13/2011 09:15 AM, Patricio Echagüe wrote: Hi Don. No it will not. IndexedSlicesQuery will read just the amount of rows specified by RowCount and will go to the DB to get the new page when needed. SetRowCount is doing indexClause.setCount(rowCount); On Mon, Oct 10, 2011 at 3:52 PM, Don Smith dsm...@likewise.com mailto:dsm...@likewise.com wrote: Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/hector/wiki/User-Guide . rangeSlicesQuery.setRowCount(1001); . rangeSlicesQuery.setKeys(lastRow.getKey(), ); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don
Re: Efficiency of hector's setRowCount (and setStartKey!)
On Thu, Oct 13, 2011 at 9:39 AM, Don Smith dsm...@likewise.com wrote: ** It's actually setStartKey that's the important method call (in combination with setRowCount). So I should have been clearer. The following code performs as expected, as far as returning the expected data in the expected order. I believe that the use of IndexedSliceQuery's setStartKey will support efficient queries -- avoiding repulling the entire data set from cassandra. Correct? correct void demoPaging() { String lastKey = processPage(don,); // get first batch, starting with (smallest key) lastKey = processPage(don,lastKey);// get second batch starting with previous last key lastKey = processPage(don,lastKey);// get third batch starting with previous last key // } // return last key processed, null when no records left String processPage(String username, String startKey) { String lastKey=null; IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); indexedSlicesQuery.addEqualsExpression(user, username); indexedSlicesQuery.setColumnNames(source,ip); indexedSlicesQuery.setColumnFamily(ourColumnFamilyName); indexedSlicesQuery.setStartKey(startKey); // indexedSlicesQuery.setRowCount(batchSize); QueryResultOrderedRowsString, String, String result =indexedSlicesQuery.execute(); OrderedRowsString,String,String rows = result.get(); for(RowString,String,String row:rows ){ if (row==null) { continue; } totalCount++; String key = row.getKey(); if (!startKey.equals(key)) {lastKey=key;} } totalCount--; return lastKey; } On 10/13/2011 09:15 AM, Patricio Echagüe wrote: Hi Don. No it will not. IndexedSlicesQuery will read just the amount of rows specified by RowCount and will go to the DB to get the new page when needed. SetRowCount is doing indexClause.setCount(rowCount); On Mon, Oct 10, 2011 at 3:52 PM, Don Smith dsm...@likewise.com wrote: Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/hector/wiki/User-Guide . rangeSlicesQuery.setRowCount(1001); . rangeSlicesQuery.setKeys(lastRow.getKey(), ); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don
Re: Storing pre-sorted data
Hi Zach, thanks for your additional input. You are absolutely right: The long namespace should be big enough. We are going to insert up to 2^32 values into the list. We only need support for get(index), insert(index) and remove(index) while get and insert will be used very often. Remove is also needed but used very rare. Kind regards Matthias On 10/13/2011 04:49 PM, Zach Richardson wrote: Matthias, Answers below. On Thu, Oct 13, 2011 at 9:03 AM, Matthias Pfaup...@l3s.de wrote: Hi Zach, thanks for that good idea. Unfortunately, our list needs to be rewritten often because our data is far away from being evenly distributed. This shouldn't be a problem if you use long's. If you were to space them at original write (with N objects) at a distance of Long.MAX_VALUE / N, and N was 10,000,000 you could still fit another 1844674407370 entries in between. However, we could get this under control but there is a more severe problem: Random access is very hard to implement on a structure with undefined distances between two following index numbers. We absolutely need random access because the lists are too big to do this on the application side :-( I'm guessing you need to be able to implement all of the traditional get(index), set(index), insert(index) type operations on the list. Once you start trying to do that, you start to hit all of the same problems you get with different in memory list implementations based on which operation is most important. Could you provide some more information on what operations will be performed the most, and how important they are. I think that would help anyone recommend a path to take. Zach Kind regards Matthias On 10/13/2011 02:30 PM, Zach Richardson wrote: Matthias, This is an interesting problem. I would consider using long's as the column type, where your column names are evenly distributed longs in sort order when you first write your list out. So if you have items A and C with the long column names 1000 and 2000, and then you have to insert B, it gets inserted at 1500. Once you run out of room between any two column name entries, i.e 1000, 1001, 1002 entries are all taken at any spot in the list, go ahead and re-write the list. If your unencrypted data is uniformly distributed, you will have very few collisions on your column names and should not have to re-write the list to often. If your lists are small enough, then you could use ints to save space, but will then have to re-write the list more often. Thanks, Zach On Thu, Oct 13, 2011 at 2:47 AM, Matthias Pfaup...@l3s.dewrote: Hi Stephen, this is a great idea but unfortunately doesn't work for us either as we can not store the data in an unencrypted form. Kind regards Matthias On 10/12/2011 07:42 PM, Stephen Connolly wrote: could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 12 Oct 2011 17:57, Matthias Pfaup...@l3s.demailto:p...@l3s.de wrote: Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form. On 10/12/2011 06:39 PM, David McNelis wrote: Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the strings you're sorting? If its viable to not convert to binary before writing to Cassandra, and you use one of the string based column ordering techniques (utf8, ascii, for example), then the data would be sorted without you needing to specifically worry about that. Of course, if the strings are lengthy you could run into additional issues. On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfaup...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.demailto:p...@l3s.dewrote: Hi there, we are currently building a prototype based on cassandra and came into problems on implementing sorted lists containing millions of items. The special thing about the items of our lists is, that cassandra is not able to sort them as the data is stored in a binary format which is not sortable. However, we are able to sort the data before the plain data gets encoded (our application is responsible for the order). First Approach: Storing Lists in ColumnFamilies *** We first tried to map the list to a single row of a ColumnFamily in a way that the index of the list is mapped to the column names and the items of the list to the column values. The column names
Re: MapReduce with two ethernet cards
What is your rpc_address set to? If it's 0.0.0.0 (bind everything) then that's not going to work if listen_address is blocked. -Brandon On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines scott.fi...@nisc.coop wrote: I upgraded to cassandra 0.8.7, and the problem persists. Scott From: Brandon Williams [dri...@gmail.com] Sent: Monday, October 10, 2011 12:28 PM To: user@cassandra.apache.org Subject: Re: MapReduce with two ethernet cards On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines scott.fi...@nisc.coop wrote: Hi all, This may be a silly question, but I'm at a bit of a loss, and was hoping for some help. I have a Cassandra cluster set up with two NICs--one for internel communication between cassandra machines (10.1.1.*), and one to respond to Thrift RPC (172.28.*.*). I also have a Hadoop cluster set up, which, for unrelated reasons, has to remain separate from Cassandra, so I've written a little MapReduce job to copy data from Cassandra to Hadoop. However, when I try to run my job, I get java.io.IOException: failed connecting to all endpoints 10.1.1.24,10.1.1.17,10.1.1.16 which is puzzling to me. It seems like the MR is attempting to connect to the internal communication IPs instead of the external Thrift IPs. Since I set up a firewall to block external access to the internal IPs of Cassandra, this is obviously going to fail. So my question is: why does Cassandra MR seem to be grabbing the listen_address instead of the Thrift one. Presuming it's not a funky configuration error or something on my part, is that strictly necessary? All told, I'd prefer if it was connecting to the Thrift IPs, but if it can't, should I open up port 7000 or port 9160 between Hadoop and Cassandra? Thanks for your help, Scott Your cassandra is old, upgrade to the latest version. -Brandon
RE: MapReduce with two ethernet cards
The listen address on all machines are set to the 10.1.1.* addresses, while the thrift rpc address is the 172.28.* addresses From: Brandon Williams [dri...@gmail.com] Sent: Thursday, October 13, 2011 12:28 PM To: user@cassandra.apache.org Subject: Re: MapReduce with two ethernet cards What is your rpc_address set to? If it's 0.0.0.0 (bind everything) then that's not going to work if listen_address is blocked. -Brandon On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines scott.fi...@nisc.coop wrote: I upgraded to cassandra 0.8.7, and the problem persists. Scott From: Brandon Williams [dri...@gmail.com] Sent: Monday, October 10, 2011 12:28 PM To: user@cassandra.apache.org Subject: Re: MapReduce with two ethernet cards On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines scott.fi...@nisc.coop wrote: Hi all, This may be a silly question, but I'm at a bit of a loss, and was hoping for some help. I have a Cassandra cluster set up with two NICs--one for internel communication between cassandra machines (10.1.1.*), and one to respond to Thrift RPC (172.28.*.*). I also have a Hadoop cluster set up, which, for unrelated reasons, has to remain separate from Cassandra, so I've written a little MapReduce job to copy data from Cassandra to Hadoop. However, when I try to run my job, I get java.io.IOException: failed connecting to all endpoints 10.1.1.24,10.1.1.17,10.1.1.16 which is puzzling to me. It seems like the MR is attempting to connect to the internal communication IPs instead of the external Thrift IPs. Since I set up a firewall to block external access to the internal IPs of Cassandra, this is obviously going to fail. So my question is: why does Cassandra MR seem to be grabbing the listen_address instead of the Thrift one. Presuming it's not a funky configuration error or something on my part, is that strictly necessary? All told, I'd prefer if it was connecting to the Thrift IPs, but if it can't, should I open up port 7000 or port 9160 between Hadoop and Cassandra? Thanks for your help, Scott Your cassandra is old, upgrade to the latest version. -Brandon
RE: MapReduce with two ethernet cards
When I look at the source for ColumnFamilyInputFormat, it appears that it does a call to client.describe_ring; when you do the equivalent call with nodetool, you get the 10.1.1.* addresses. This seems to indicate to me that I should open up the firewall and attempt to contact those IPs instead of the normal thrift IPs. That leads me to think that I need to have thrift listening on both IPs, though. Would that then be the case? Scott From: Scott Fines [scott.fi...@nisc.coop] Sent: Thursday, October 13, 2011 12:40 PM To: user@cassandra.apache.org Subject: RE: MapReduce with two ethernet cards The listen address on all machines are set to the 10.1.1.* addresses, while the thrift rpc address is the 172.28.* addresses From: Brandon Williams [dri...@gmail.com] Sent: Thursday, October 13, 2011 12:28 PM To: user@cassandra.apache.org Subject: Re: MapReduce with two ethernet cards What is your rpc_address set to? If it's 0.0.0.0 (bind everything) then that's not going to work if listen_address is blocked. -Brandon On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines scott.fi...@nisc.coop wrote: I upgraded to cassandra 0.8.7, and the problem persists. Scott From: Brandon Williams [dri...@gmail.com] Sent: Monday, October 10, 2011 12:28 PM To: user@cassandra.apache.org Subject: Re: MapReduce with two ethernet cards On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines scott.fi...@nisc.coop wrote: Hi all, This may be a silly question, but I'm at a bit of a loss, and was hoping for some help. I have a Cassandra cluster set up with two NICs--one for internel communication between cassandra machines (10.1.1.*), and one to respond to Thrift RPC (172.28.*.*). I also have a Hadoop cluster set up, which, for unrelated reasons, has to remain separate from Cassandra, so I've written a little MapReduce job to copy data from Cassandra to Hadoop. However, when I try to run my job, I get java.io.IOException: failed connecting to all endpoints 10.1.1.24,10.1.1.17,10.1.1.16 which is puzzling to me. It seems like the MR is attempting to connect to the internal communication IPs instead of the external Thrift IPs. Since I set up a firewall to block external access to the internal IPs of Cassandra, this is obviously going to fail. So my question is: why does Cassandra MR seem to be grabbing the listen_address instead of the Thrift one. Presuming it's not a funky configuration error or something on my part, is that strictly necessary? All told, I'd prefer if it was connecting to the Thrift IPs, but if it can't, should I open up port 7000 or port 9160 between Hadoop and Cassandra? Thanks for your help, Scott Your cassandra is old, upgrade to the latest version. -Brandon
Re: Storing pre-sorted data
in theory, however they have less than 32 bits of entropy from which they can do that, leaving them with at least 32 more bits of combinations to try... that's 2 billion or so... must be a big dictionary - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 13 Oct 2011 17:57, Matthias Pfau p...@l3s.de wrote: Hi Stephen, this sounds very reasonable. But wouldn't this enable an attacker to execute dictionary attacks in order to decrypt the first 8 bytes of the plain text? Kind regards Matthias On 10/13/2011 05:03 PM, Stephen Connolly wrote: It wouldn't be unencrypted... which is the point you use a one way linear hash function to take the first, say 8 bytes, of unencrypted data and turn it into 4 bytes of a sort prefix. You've used lost half the data in the process, so effectively each bit is an OR of two bits and you can only infer from 0 values... so data is still encrypted, but you have an approximate sorting. For example, if your data is US-ASCII text with no numbers, you could use Soundex to get the pre-key, so that worst case you have a bucket of values in the range. Using this technique, a random get will have to get the values at the desired prefix +/- a small amount rather than the whole row... on the client side you can then decrypt the data and sort that small bucket to get the correct index position. You could do a 1 byte prefix, but that only gives you at best 256 buckets and assumes that the first 2 bytes are uniformly distributed... you've said your data is not uniformly distributed, so a linear hash function sounds like your best bet. your hash function should have the property that hash(A)= hash(B) if and only if A= B On 13 October 2011 08:47, Matthias Pfaup...@l3s.de wrote: Hi Stephen, this is a great idea but unfortunately doesn't work for us either as we can not store the data in an unencrypted form. Kind regards Matthias On 10/12/2011 07:42 PM, Stephen Connolly wrote: could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 12 Oct 2011 17:57, Matthias Pfaup...@l3s.demailto:pfau@**l3s.dep...@l3s.de wrote: Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form. On 10/12/2011 06:39 PM, David McNelis wrote: Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the strings you're sorting? If its viable to not convert to binary before writing to Cassandra, and you use one of the string based column ordering techniques (utf8, ascii, for example), then the data would be sorted without you needing to specifically worry about that. Of course, if the strings are lengthy you could run into additional issues. On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfaup...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.demailto:pfa**u...@l3s.de p...@l3s.de wrote: Hi there, we are currently building a prototype based on cassandra and came into problems on implementing sorted lists containing millions of items. The special thing about the items of our lists is, that cassandra is not able to sort them as the data is stored in a binary format which is not sortable. However, we are able to sort the data before the plain data gets encoded (our application is responsible for the order). First Approach: Storing Lists in ColumnFamilies *** We first tried to map the list to a single row of a ColumnFamily in a way that the index of the list is mapped to the column names and the items of the list to the column values. The column names are increasing numbers which define the sort order. This has the major drawback that big parts of the list have to be rewritten on inserts (because the column names are numbered by their index), which are quite common. Second Approach: Storing the whole List as Binary Data: *** We tried to store the compressed list in a single column. However, this is only feasible for smaller lists. Our lists are far to big leading to multi megabyte reads and writes. As we need to read and update the lists quite
Re: Storing pre-sorted data
Hi Stephen, we are hashing the first 8 byte (8 US-ASCII characters) of text that has been written by humans. Wouldn't it be easy for the attacker to do a dictionary attack on this text, especially if he knows the language of the text? Kind regards Matthias On 10/13/2011 08:20 PM, Stephen Connolly wrote: in theory, however they have less than 32 bits of entropy from which they can do that, leaving them with at least 32 more bits of combinations to try... that's 2 billion or so... must be a big dictionary - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 13 Oct 2011 17:57, Matthias Pfau p...@l3s.de mailto:p...@l3s.de wrote: Hi Stephen, this sounds very reasonable. But wouldn't this enable an attacker to execute dictionary attacks in order to decrypt the first 8 bytes of the plain text? Kind regards Matthias On 10/13/2011 05:03 PM, Stephen Connolly wrote: It wouldn't be unencrypted... which is the point you use a one way linear hash function to take the first, say 8 bytes, of unencrypted data and turn it into 4 bytes of a sort prefix. You've used lost half the data in the process, so effectively each bit is an OR of two bits and you can only infer from 0 values... so data is still encrypted, but you have an approximate sorting. For example, if your data is US-ASCII text with no numbers, you could use Soundex to get the pre-key, so that worst case you have a bucket of values in the range. Using this technique, a random get will have to get the values at the desired prefix +/- a small amount rather than the whole row... on the client side you can then decrypt the data and sort that small bucket to get the correct index position. You could do a 1 byte prefix, but that only gives you at best 256 buckets and assumes that the first 2 bytes are uniformly distributed... you've said your data is not uniformly distributed, so a linear hash function sounds like your best bet. your hash function should have the property that hash(A)= hash(B) if and only if A= B On 13 October 2011 08:47, Matthias Pfaup...@l3s.de mailto:p...@l3s.de wrote: Hi Stephen, this is a great idea but unfortunately doesn't work for us either as we can not store the data in an unencrypted form. Kind regards Matthias On 10/12/2011 07:42 PM, Stephen Connolly wrote: could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 12 Oct 2011 17:57, Matthias Pfaup...@l3s.de mailto:p...@l3s.demailto:pfau@__l3s.de mailto:p...@l3s.de wrote: Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form. On 10/12/2011 06:39 PM, David McNelis wrote: Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the strings you're sorting? If its viable to not convert to binary before writing to Cassandra, and you use one of the string based column ordering techniques (utf8, ascii, for example), then the data would be sorted without you needing to specifically worry about that. Of course, if the strings are lengthy you could run into additional issues. On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfaup...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.demailto:pfa...@l3s.de mailto:p...@l3s.de wrote: Hi there, we are currently building a prototype based
Re: Storing pre-sorted data
Then just use a soundex function on the first word in the text... that will shrink it sufficiently and give nice buckets in near sequential order (http://en.wikipedia.org/wiki/Soundex) On 13 October 2011 21:21, Matthias Pfau p...@l3s.de wrote: Hi Stephen, we are hashing the first 8 byte (8 US-ASCII characters) of text that has been written by humans. Wouldn't it be easy for the attacker to do a dictionary attack on this text, especially if he knows the language of the text? Kind regards Matthias On 10/13/2011 08:20 PM, Stephen Connolly wrote: in theory, however they have less than 32 bits of entropy from which they can do that, leaving them with at least 32 more bits of combinations to try... that's 2 billion or so... must be a big dictionary - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 13 Oct 2011 17:57, Matthias Pfau p...@l3s.de mailto:p...@l3s.de wrote: Hi Stephen, this sounds very reasonable. But wouldn't this enable an attacker to execute dictionary attacks in order to decrypt the first 8 bytes of the plain text? Kind regards Matthias On 10/13/2011 05:03 PM, Stephen Connolly wrote: It wouldn't be unencrypted... which is the point you use a one way linear hash function to take the first, say 8 bytes, of unencrypted data and turn it into 4 bytes of a sort prefix. You've used lost half the data in the process, so effectively each bit is an OR of two bits and you can only infer from 0 values... so data is still encrypted, but you have an approximate sorting. For example, if your data is US-ASCII text with no numbers, you could use Soundex to get the pre-key, so that worst case you have a bucket of values in the range. Using this technique, a random get will have to get the values at the desired prefix +/- a small amount rather than the whole row... on the client side you can then decrypt the data and sort that small bucket to get the correct index position. You could do a 1 byte prefix, but that only gives you at best 256 buckets and assumes that the first 2 bytes are uniformly distributed... you've said your data is not uniformly distributed, so a linear hash function sounds like your best bet. your hash function should have the property that hash(A)= hash(B) if and only if A= B On 13 October 2011 08:47, Matthias Pfaup...@l3s.de mailto:p...@l3s.de wrote: Hi Stephen, this is a great idea but unfortunately doesn't work for us either as we can not store the data in an unencrypted form. Kind regards Matthias On 10/12/2011 07:42 PM, Stephen Connolly wrote: could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values? - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 12 Oct 2011 17:57, Matthias Pfaup...@l3s.de mailto:p...@l3s.demailto:pfau@__l3s.de mailto:p...@l3s.de wrote: Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form. On 10/12/2011 06:39 PM, David McNelis wrote: Is it an option to not convert the data to binary prior to inserting into Cassandra? Also, how large are the strings you're sorting? If its viable to not convert to binary before writing to Cassandra, and you use one of the string based column ordering techniques (utf8, ascii, for example), then the data would be sorted without you needing to specifically worry about that. Of course, if the strings are lengthy you could run into additional issues. On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfaup...@l3s.de mailto:p...@l3s.de mailto:p...@l3s.de
Re: Cassandra as session store under heavy load
Or upgrade to 1.0 and use leveled compaction (http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra) On Thu, Oct 13, 2011 at 4:28 PM, aaron morton aa...@thelastpickle.com wrote: They only have a minimum time, gc_grace_seconds for deletes. If you want to be really watch disk space reduce the compaction thresholds on the CF. Or run a major compaction as part of maintenance. cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/10/2011, at 10:50 PM, Maciej Miklas wrote: durable_writes sounds great - thank you! I really do not need commit log here. Another question: it is possible to configure live time of Tombstones? Regards, Maciej -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Restore snapshots suggestion
If I need to restore snapshots from all nodes, but I can only shutdown one node a time since it is production, is there a way I can stop data syncing between nodes temporarily? I don't want the existing data overwrites the snapshot. I found this undocumented parameter DoConsistencyChecksBoolean(http://www.datastax.com/dev/blog/whats-new-cassandra-066) to disable read repair, what is the proper way to do it? I am on 0.8.6. Thank you in advance, Daning
Re: Schema versions reflect schemas on unwanted nodes
Thanks Brandon! Out of curiosity, would making schema changes through a thrift interface (via hector) be any different? In other words, would using hector instead of the cli make schema changes possible without upgrading? On Thu, Oct 13, 2011 at 8:22 AM, Brandon Williams dri...@gmail.com wrote: You're running into https://issues.apache.org/jira/browse/CASSANDRA-3259 Try upgrading and doing a rolling restart. -Brandon On Thu, Oct 13, 2011 at 9:11 AM, Eric Czech e...@nextbigsound.com wrote: Nope, there was definitely no intersection of the seed nodes between the two clusters so I'm fairly certain that the second cluster found out about the first through what was in the LocationInfo* system tables. Also, I don't think that procedure will really help because I don't actually want the schema on cass-analysis-1 to be consistent with the schema in the original cluster -- I just want to totally remove it. On Thu, Oct 13, 2011 at 8:01 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Do you have same seed node specified in cass-analysis-1 as cass-1,2,3? I am thinking that changing the seed node in cass-analysis-2 and following the directions in http://wiki.apache.org/cassandra/FAQ#schema_disagreement might solve the problem. Somone please correct me. On Thu, Oct 13, 2011 at 12:05 AM, Eric Czech e...@nextbigsound.com wrote: I don't think that's what I'm after here since the unwanted nodes were originally assimilated into the cluster with the same initial_token values as other nodes that were already in the cluster (that have, and still do have, useful data). I know this is an awkward situation so I'll try to depict it in a simpler way: Let's say I have a simplified version of our production cluster that looks like this - cass-1 token = A cass-2 token = B cass-3 token = C Then I tried to create a second cluster that looks like this - cass-analysis-1 token = A (and contains same data as cass-1) cass-analysis-2 token = B (and contains same data as cass-2) cass-analysis-3 token = C (and contains same data as cass-3) But after starting the second cluster, things got crossed up between the clusters and here's what the original cluster now looks like - cass-1 token = A (has data and schema) cass-2 token = B (has data and schema) cass-3 token = C (had data and schema) cass-analysis-1 token = A (has *no* data and is not part of the ring, but is trying to be included in cluster schema) A simplified version of describe cluster for the original cluster now shows: Cluster Information: Schema versions: SCHEMA-UUID-1: [cass-1, cass-2, cass-3] SCHEMA-UUID-2: [cass-analysis-1] But the simplified ring looks like this (has only 3 nodes instead of 4): Host Owns Token cass-1 33% A cass-2 33% B cass-3 33% C The original cluster is still working correctly but all live schema updates are failing because of the inconsistent schema versions introduced by the unwanted node. From my perspective, a simple fix seems to be for cassandra to exclude nodes that aren't part of the ring from the schema consistency requirements. Any reason that wouldn't work? And aside from a possible code patch, any recommendations as to how I can best fix this given the current 8.4 release? On Thu, Oct 13, 2011 at 12:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Does nodetool removetoken not work? On Thu, Oct 13, 2011 at 12:59 AM, Eric Czech e...@nextbigsound.com wrote: Not sure if anyone has seen this before but it's really killing me right now. Perhaps that was too long of a description of the issue so here's a more succinct question -- How do I remove nodes associated with a cluster that contain no data and have no reason to be associated with the cluster whatsoever? My last resort here is to stop cassandra (after recording all tokens for each node), set the initial token for each node in the cluster in cassandra.yaml, manually delete the LocationInfo* sstables in the system keyspace, and then restart. I'm hoping there's a simpler, less seemingly risky way to do this so please, please let me know if that's true! Thanks again. - Eric On Tue, Oct 11, 2011 at 11:55 AM, Eric Czech e...@nextbigsound.com wrote: Hi, I'm having what I think is a fairly uncommon schema issue -- My situation is that I had a cluster with 10 nodes and a consistent schema. Then, in an experiment to setup a second cluster with the same information (by copying the raw sstables), I left the LocationInfo* sstables in the system keyspace in the new cluster and after starting the second cluster, I realized that the two clusters were discovering each other when