AW: Cassandra at Amazon AWS
On a side note: If you are going for priam AND you are using LeveledCompaction think carefully whether you need incremental backups. The s3 upload cost can be very high because Leveled Compaction tends to create a lot of files and each put request to s3 costs money. We had this setup in relatively small cluster of 4 nodes where the switch to leveledcompaction increased backup cost by 800 Euro a month. Greetings Roland Von: Roland Gude [mailto:roland.g...@ez.no] Gesendet: Freitag, 18. Januar 2013 09:23 An: user@cassandra.apache.org Betreff: AW: Cassandra at Amazon AWS Priam is good for backups but it is another complex (but very good) part to a software stack. A simple solution is to do regular snapshots (via cron) Compress them and put them into s3 On the s3 you can simply choose how many days the files are kept. This can be done with a couple of lines of shellscript. And a simple crontab entry Von: Marcelo Elias Del Valle [mailto:mvall...@gmail.com] Gesendet: Freitag, 18. Januar 2013 04:53 An: user@cassandra.apache.orgmailto:user@cassandra.apache.org Betreff: Re: Cassandra at Amazon AWS Everyone, thanks a lot for the answer, they helped me a lot. 2013/1/17 Andrey Ilinykh ailin...@gmail.commailto:ailin...@gmail.com I'd recommend Priam. http://techblog.netflix.com/2012/02/announcing-priam.html Andrey On Thu, Jan 17, 2013 at 5:44 AM, Adam Venturella aventure...@gmail.commailto:aventure...@gmail.com wrote: Jared, how do you guys handle data backups for your ephemeral based cluster? I'm trying to move to ephemeral drives myself, and that was my last sticking point; asking how others in the community deal with backup in case the VM explodes. On Wed, Jan 16, 2013 at 1:21 PM, Jared Biel jared.b...@bolderthinking.commailto:jared.b...@bolderthinking.com wrote: We're currently using Cassandra on EC2 at very low scale (a 2 node cluster on m1.large instances in two regions.) I don't believe that EBS is recommended for performance reasons. Also, it's proven to be very unreliable in the past (most of the big/notable AWS outages were due to EBS issues.) We've moved 99% of our instances off of EBS. As other have said, if you require more space in the future it's easy to add more nodes to the cluster. I've found this page (http://www.ec2instances.info/) very useful in determining the amount of space each instance type has. Note that by default only one ephemeral drive is attached and you must specify all ephemeral drives that you want to use at launch time. Also, you can create a RAID 0 of all local disks to provide maximum speed and space. On 16 January 2013 20:42, Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com wrote: Hello, I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using local EC2 EBS disks. My system is running fine for my tests, but to me it's not a good setup for production. I need my system to perform well for specially for writes on cassandra, but the amount of data could grow really big, taking several Tb of total storage. My first guess was using S3 as a storage and I saw this can be done by using Cloudian package, but I wouldn't like to become dependent on a pre-package solution and I found it's kind of expensive for more than 100Tb: http://www.cloudian.com/pricing.html I saw some discussion at internet about using EBS or ephemeral disks for storage at Amazon too. My question is: does someone on this list have the same problem as me? What are you using as solution to Cassandra's storage when running it at Amazon AWS? Any thoughts would be highly appreciatted. Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
AW: Cassandra at Amazon AWS
Priam is good for backups but it is another complex (but very good) part to a software stack. A simple solution is to do regular snapshots (via cron) Compress them and put them into s3 On the s3 you can simply choose how many days the files are kept. This can be done with a couple of lines of shellscript. And a simple crontab entry Von: Marcelo Elias Del Valle [mailto:mvall...@gmail.com] Gesendet: Freitag, 18. Januar 2013 04:53 An: user@cassandra.apache.org Betreff: Re: Cassandra at Amazon AWS Everyone, thanks a lot for the answer, they helped me a lot. 2013/1/17 Andrey Ilinykh ailin...@gmail.commailto:ailin...@gmail.com I'd recommend Priam. http://techblog.netflix.com/2012/02/announcing-priam.html Andrey On Thu, Jan 17, 2013 at 5:44 AM, Adam Venturella aventure...@gmail.commailto:aventure...@gmail.com wrote: Jared, how do you guys handle data backups for your ephemeral based cluster? I'm trying to move to ephemeral drives myself, and that was my last sticking point; asking how others in the community deal with backup in case the VM explodes. On Wed, Jan 16, 2013 at 1:21 PM, Jared Biel jared.b...@bolderthinking.commailto:jared.b...@bolderthinking.com wrote: We're currently using Cassandra on EC2 at very low scale (a 2 node cluster on m1.large instances in two regions.) I don't believe that EBS is recommended for performance reasons. Also, it's proven to be very unreliable in the past (most of the big/notable AWS outages were due to EBS issues.) We've moved 99% of our instances off of EBS. As other have said, if you require more space in the future it's easy to add more nodes to the cluster. I've found this page (http://www.ec2instances.info/) very useful in determining the amount of space each instance type has. Note that by default only one ephemeral drive is attached and you must specify all ephemeral drives that you want to use at launch time. Also, you can create a RAID 0 of all local disks to provide maximum speed and space. On 16 January 2013 20:42, Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com wrote: Hello, I am currently using hadoop + cassandra at amazon AWS. Cassandra runs on EC2 and my hadoop process runs at EMR. For cassandra storage, I am using local EC2 EBS disks. My system is running fine for my tests, but to me it's not a good setup for production. I need my system to perform well for specially for writes on cassandra, but the amount of data could grow really big, taking several Tb of total storage. My first guess was using S3 as a storage and I saw this can be done by using Cloudian package, but I wouldn't like to become dependent on a pre-package solution and I found it's kind of expensive for more than 100Tb: http://www.cloudian.com/pricing.html I saw some discussion at internet about using EBS or ephemeral disks for storage at Amazon too. My question is: does someone on this list have the same problem as me? What are you using as solution to Cassandra's storage when running it at Amazon AWS? Any thoughts would be highly appreciatted. Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
AW: TTL on SecondaryIndex Columns. A bug?
I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670 Unfortunately apart from me no one was yet able to reproduce. Check if data is available before/after compaction If you have leveled compaction it is hard to test because you cannot trigger compaction manually. -Ursprüngliche Nachricht- Von: Alexei Bakanov [mailto:russ...@gmail.com] Gesendet: Mittwoch, 19. Dezember 2012 09:35 An: user@cassandra.apache.org Betreff: Re: TTL on SecondaryIndex Columns. A bug? I'm running on a single node on my laptop. It looks like the point when rows dissapear from the index depends on JVM memory settings. With more memory it needs more data to feed in before things start disappearing. Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M To be sure, try to get rows for 'indexedColumn'='1': [default@ks123] get cf1 where 'indexedColumn'='1'; 0 Row Returned. Thanks On 19 December 2012 05:15, aaron morton aa...@thelastpickle.com wrote: Thanks for the nice steps to reproduce. I ran this on my MBP using C* 1.1.7 and got the expected results, both get's returned a row. Were you running against a single node or a cluster ? If a cluster did you change the CL, cassandra-cli defaults to ONE. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/12/2012, at 9:44 PM, Alexei Bakanov russ...@gmail.com wrote: Hi, We are having an issue with TTL on Secondary index columns. We get 0 rows in return when running queries on indexed columns that have TTL. Everything works fine with small amounts of data, but when we get over a ceratin threshold it looks like older rows dissapear from the index. In the example below we create 70 rows with 45k columns each + one indexed column with just the rowkey as value, so we have one row per indexed value. When the script is finished the index contains rows 66-69. Rows 0-65 are gone from the index. Using 'indexedColumn' without TTL fixes the problem. - SCHEMA START - create keyspace ks123 with placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {datacenter1 : 1} and durable_writes = true; use ks123; create column family cf1 with column_type = 'Standard' and comparator = 'AsciiType' and default_validation_class = 'AsciiType' and key_validation_class = 'AsciiType' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and column_metadata = [ {column_name : 'indexedColumn', validation_class : AsciiType, index_name : 'INDEX1', index_type : 0}] and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; - SCHEMA FINISH - - POPULATE START - from pycassa.batch import Mutator import pycassa pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool, 'cf1') for rowKey in xrange(70): b = Mutator(pool) for datapoint in xrange(1, 45001): b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000); b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600); print 'row %d' % rowKey b.send() b = Mutator(pool) pool.dispose() - POPULATE FINISH - - QUERY START - [default@ks123] get cf1 where 'indexedColumn'='65'; 0 Row Returned. Elapsed time: 2.38 msec(s). [default@ks123] get cf1 where 'indexedColumn'='66'; --- RowKey: 66 = (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ... = (column=10087, value=val, timestamp=1355818766075538, ttl=7884000) = (column=indexedColumn, value=66, timestamp=1355818768119334, ttl=7887600) 1 Row Returned. Elapsed time: 31 msec(s). - QUERY FINISH - This is all using Cassandra 1.1.7 with default settings. Best regards, Alexei Bakanov
AW: Replication Factor and Consistency Level Confusion
Hi RF 2 means that 2 nodes are responsible for any given row (no matter how many nodes are in the cluster) For your cluster with three nodes let's just assume the following responsibilities NodeA B C Primary keys0-5 6-1011-15 Replica keys11-15 0-5 6-10 Assume node 'C' is down Writing any key in range 0-5 with consistency TWO is possible (A and B are up) Writing any key in range 11-15 with consistency TWO will fail (C is down and 11-15 is its primary range) Writing any key in range 6-10 with consistency TWO will fail (C is down and it is the replica for this range) I hope this explains it. -Ursprüngliche Nachricht- Von: Vasileios Vlachos [mailto:vasileiosvlac...@gmail.com] Gesendet: Mittwoch, 19. Dezember 2012 17:07 An: user@cassandra.apache.org Betreff: Replication Factor and Consistency Level Confusion Hello All, We have a 3-node cluster and we created a keyspace (say Test_1) with Replication Factor set to 3. I know is not great but we wanted to test different behaviors. So, we created a Column Family (say cf_1) and we tried writing something with Consistency Level ANY, ONE, TWO, THREE, QUORUM and ALL. We did that while all nodes were in UP state, so we had no problems at all. No matter what the Consistency Level was, we were able to insert a value. Same cluster, different keyspace (say Test_2) with Replication Factor set to 2 this time and one of the 3 nodes deliberately DOWN. Again, we created a Column Family (say cf_1) and we tried writing something with different Consistency Levels. Here is what we got: ANY: worked (expected...) ONE: worked (expected...) TWO: did not work (WHT???) THREE: did not work (expected...) QUORUM: worked (expected...) ALL: did not work (expected I guess...) Now, we know that QUORUM derives from (RF/2)+1, so we were expecting that to work, after all only 1 node was DOWN. Why did Consistency Level TWO not work then??? Third test... Same cluster again, different keyspace (say Test_3) with Replication Factor set to 3 this time and 1 of the 3 nodes deliberately DOWN again. Same approach again, created different Column Family (say cf_1) and different Consistency Level settings resulted in the following: ANY: worked (what???) ONE: worked (what???) TWO: did not work (what???) THREE: did not work (expected...) QUORUM: worked (what???) ALL: worked (what???) We thought that if the Replication Factor is greater than the number of nodes in the cluster, writes are blocked. Apparently we are completely missing the a level of understanding here, so we would appreciate any help! Thank you in advance! Vasilis
AW: secondery indexes TTL - strange issues
Issue created. Will attach debug logs asap CASSANDRA-4670https://issues.apache.org/jira/browse/CASSANDRA-4670 Von: aaron morton [mailto:aa...@thelastpickle.com] Gesendet: Montag, 17. September 2012 03:46 An: user@cassandra.apache.org Betreff: Re: secondery indexes TTL - strange issues Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system). If you can reproduce this please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA . If you can include DEBUG level logs that would be helpful. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 10:08 PM, Roland Gude roland.g...@ez.nomailto:roland.g...@ez.no wrote: I am not sure it is compacting an old file: the same thing happens eeverytime I rebuild the index. New Files appear, get compacted and vanish. We have set up a new smaller cluster with fresh data. Same thing happens here as well. Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system). I am currently testing with SizeTiered on both the fresh set and the imported set. For the fresh set (which is significantly smaller) first results imply that the issue is not happening with SizeTieredCompaction - I have not yet tested everything that comes into my mind and will update if something new comes up. As for the failing query it is from the cli: get EventsByItem where 0003--1000--=utf8('someValue'); 0003--1000-- is a TUUID we use as a marker for a TimeSeries. (and equivalent queries with astyanax and hector as well) This is a cf with the issue: create column family EventsByItem with column_type = 'Standard' and comparator = 'TimeUUIDType' and default_validation_class = 'BytesType' and key_validation_class = 'BytesType' and read_repair_chance = 0.5 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and column_metadata = [ {column_name : '--1000--', validation_class : BytesType, index_name : 'ebi_mandatorIndex', index_type : 0}, {column_name : '0002--1000--', validation_class : BytesType, index_name : 'ebi_itemidIndex', index_type : 0}, {column_name : '0003--1000--', validation_class : BytesType, index_name : 'ebi_eventtypeIndex', index_type : 0}] and compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64}; Von: aaron morton [mailto:aa...@thelastpickle.comhttp://thelastpickle.com] Gesendet: Freitag, 14. September 2012 10:46 An: user@cassandra.apache.orgmailto:user@cassandra.apache.org Betreff: Re: secondery indexes TTL - strange issues INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line 221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E ventsByItem.ebi_eventtypeIndex-he-10-Data.db,]. 78,623,000 to 373,348 (~0% of o riginal) bytes for 83 keys at 0.000280MB/s. Time: 1,272,883ms. There is a lot of weird things here. It could be levelled compaction compacting an older file for the first time. But that would be a guess. Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again. Are you able to do a test with SiezedTieredCompaction ? Are you able to replicate the problem with a fresh testing CF and some test Data? If it's only a problem with imported data can you provide a sample of the failing query ? Any maybe the CF definition ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/09/2012, at 2:46 AM, Roland Gude roland.g...@ez.nomailto:roland.g...@ez.no wrote: Hi, we have been running a system on Cassandra 0.7 heavily relying on secondary indexes for columns with TTL. This has been working like a charm, but we are trying hard to move forward with Cassandra and are struggling at that point: When we put our data into a new cluster (any 1.1.x version - currently 1.1.5) , rebuild indexes and run our system, everything seems to work good - until in some point of time index queries do not return any data at all anymore (note that the TTL has not yet expired for several months). Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again. What seems strange is that compaction apparently is very aggressive: INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line 221) Compacted to [/var/lib/cassandra/data/Eventstore
secondery indexes TTL - strange issues
Hi, we have been running a system on Cassandra 0.7 heavily relying on secondary indexes for columns with TTL. This has been working like a charm, but we are trying hard to move forward with Cassandra and are struggling at that point: When we put our data into a new cluster (any 1.1.x version - currently 1.1.5) , rebuild indexes and run our system, everything seems to work good - until in some point of time index queries do not return any data at all anymore (note that the TTL has not yet expired for several months). Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again. What seems strange is that compaction apparently is very aggressive: INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line 221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E ventsByItem.ebi_eventtypeIndex-he-10-Data.db,]. 78,623,000 to 373,348 (~0% of o riginal) bytes for 83 keys at 0.000280MB/s. Time: 1,272,883ms. Actually we have switched to LeveledCompaction. Could it be that leveled compaction does not play nice with indexes?
AW: How to control location of data?
Hi, i think everything is called a replica so if data is on 3 nodes you have 3 replicas. There is no such thing as an original. A partitioner decides into which partition a piece of data belongs A replica placement strategy decides which partition goes on which node You cannot suppress the partitioner. You can select different placement strategies and partitioners for different keyspaces, thereby choosing known data to be stored on known hosts. This is however discouraged for various reasons - i.e. you need a lot of knowledge about your data to keep the cluster balanced. What is your usecase for this requirement? there is probably a more suitable solution. Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] Gesendet: Dienstag, 10. Januar 2012 09:53 An: user@cassandra.apache.org Betreff: How to control location of data? Hi! We're evaluating Cassandra for our storage needs. One of the key benefits we see is the online replication of the data, that is an easy way to share data across nodes. But we have the need to precisely control on what node group specific parts of a key space (columns/column families) are stored on. Now we're having trouble understanding the documentation. Could anyone help us with to find some answers to our questions? * What does the term replica mean: If a key is stored on exactly three nodes in a cluster, is it correct then to say that there are three replicas of that key or are there just two replicas (copies) and one original? * What is the relation between the Cassandra concepts Partitioner and Replica Placement Strategy? According to documentation found on DataStax web site and architecture internals from the Cassandra Wiki the first storage location of a key (and its associated data) is determined by the Partitioner whereas additional storage locations are defined by Replica Placement Strategy. I'm wondering if I could completely redefine the way how nodes are selected to store a key by just implementing my own subclass of AbstractReplicationStrategy and configuring that subclass into the key space. * How can I suppress that the Partitioner is consulted at all to determine what node stores a key first? * Is a key space always distributed across the whole cluster? Is it possible to configure Cassandra in such a way that more or less freely chosen parts of a key space (columns) are stored on arbitrarily chosen nodes? Any tips would be very appreciated :-)
AW: AW: How to control location of data?
Each node in the cluster is assigned a token (can be done automatically - but usually should not) The token of a node is the start token of the partition it is responsible for (and the token of the next node is the end token of the current tokens partition) Assume you have the following nodes/tokens (which are usually numbers but for the example I will use letters) N1/A N2/D N3/M N4/X This means that N1 is responsible (primary) for [A-D) N2 for [D-M) N3 for [M-X) And N4 for [X-A) If you have a replication factor of 1 data will go on the nodes like this: B - N1 E-N2 X-N4 And so on If you have a higher replication factor, the placement strategy decides which node will take replicas of which partition (becoming secondary node for that partition) Simple strategy will just put the replica on the next node in the ring So same example as above but RF of 2 and simple strategy: B- N1 and N2 E - N2 and N3 X - N4 and N1 Other strategies can factor in things like put data in another datacenter or put data in another rack or such things. Even though the terms primary and secondary imply some means of quality or consistency, this is not the case. If a node is responsible for a piece of data, it will store it. But placement of the replicas is usually only relevant for availability reasons (i.e. disaster recovery etc.) Actual location should mean nothing to most applications as you can ask any node for the data you want and it will provide it to you (fetching it from the responsible nodes). This should be sufficient in almost all cases. So in the above example again, you can ask N3 what data is available and it will tell you: B, E and X, or you could ask it give me X and it will fetch it from N4 or N1 or both of them depending on consistency configuration and return the data to you. So actually if you use Cassandra - for the application the actual storage location of the data should not matter. It will be available anywhere in the cluster if it is stored on any reachable node. Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] Gesendet: Dienstag, 10. Januar 2012 15:06 An: user@cassandra.apache.org Betreff: Re: AW: How to control location of data? Hi! Thank you for your last reply. I'm still wondering if I got you right... ... A partitioner decides into which partition a piece of data belongs Does your statement imply that the partitioner does not take any decisions at all on the (physical) storage location? Or put another way: What do you mean with partition? To quote http://wiki.apache.org/cassandra/ArchitectureInternals: ... AbstractReplicationStrategy controls what nodes get secondary, tertiary, etc. replicas of each key range. Primary replica is always determined by the token ring (...) ... You can select different placement strategies and partitioners for different keyspaces, thereby choosing known data to be stored on known hosts. This is however discouraged for various reasons - i.e. you need a lot of knowledge about your data to keep the cluster balanced. What is your usecase for this requirement? there is probably a more suitable solution. What we want is to partition the cluster with respect to key spaces. That is we want to establish an association between nodes and key spaces so that a node of the cluster holds data from a key space if and only if that node is a *member* of that key space. To our knowledge Cassandra has no built-in way to specify such a membership-relation. Therefore we thought of implementing our own replica placement strategy until we started to assume that the partitioner had to be replaced, too, to accomplish the task. Do you have any ideas? Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] Gesendet: Dienstag, 10. Januar 2012 09:53 An: user@cassandra.apache.orgmailto:user@cassandra.apache.org Betreff: How to control location of data? Hi! We're evaluating Cassandra for our storage needs. One of the key benefits we see is the online replication of the data, that is an easy way to share data across nodes. But we have the need to precisely control on what node group specific parts of a key space (columns/column families) are stored on. Now we're having trouble understanding the documentation. Could anyone help us with to find some answers to our questions? * What does the term replica mean: If a key is stored on exactly three nodes in a cluster, is it correct then to say that there are three replicas of that key or are there just two replicas (copies) and one original? * What is the relation between the Cassandra concepts Partitioner and Replica Placement Strategy? According to documentation found on DataStax web site and architecture internals from the Cassandra Wiki the first storage location of a key (and its associated data) is determined by the Partitioner whereas additional storage locations are defined by Replica Placement Strategy. I'm wondering if I could
AW: Garbage collection freezes cassandra node
Tuning garbage colletion is really hard. Espescially if you do not know why garbage collection stalls. In general I must say I have never seen a software which shipped with such a good garbage collection configuration as Cassandra. The thing that looks suspiscious is that the major collections appear regularly in 1 hour intervals. The only software I know of is RMI explicit collections (which I am not certain Cassandra uses) You could avoid those with -XX:+DisableExplicitGC In the Cassandra start script But I assume that Cassandra itself makes use of explicit gc as well so this might have some nasty side effects. Unless someone tells you never to do that on Cassandra I would go for a try and see what happens. Keep flushes and deletions of sstables on your mind though, they are somehow tied to gc I think. Another option (if it is RMI) is to make the RMI full gc threshold larger for timeouts to occur less often. -Dsun.rmi.dgc.client.gcInterval=60 -Dsun.rmi.dgc.server.gcInterval=60 The number is ms until a gc is triggered. Anyway.. same warning as before applies Cheers. Von: Rene Kochen [mailto:rene.koc...@emea.schange.com] Gesendet: Montag, 19. Dezember 2011 16:35 An: user@cassandra.apache.org Betreff: Garbage collection freezes cassandra node I recently see the following garbage collect behavior in our performance tests (the attached chart shows the heap-size in MB): [cid:image001.jpg@01CCBE73.8B567F80] During the garbage collections, Cassandra freezes for about ten seconds. I observe the following log entries: GC for ConcurrentMarkSweep: 11597 ms for 1 collections, 1887933144 used; max is 8550678528 I use Windows Server 2008 and Cassandra 0.7.10 with min and max heap size set to 8 GB. What can I do to make Cassandra not freeze? Just allocate more memory? Thanks, Rene inline: image001.jpg
AW: Pending ReadStage is exploding on only one node
Are you using indexslicequeries? I described a similar problem a couple of months ago (and mechanisms to reproduce the behavior) but unfortunately failed to create an issue for it (shame on me). The mail thread is in the archives http://www.mail-archive.com/user@cassandra.apache.org/msg16157.html Von: Johann Höchtl [mailto:h.hoec...@ic-drei.de] Gesendet: Montag, 21. November 2011 22:17 An: user@cassandra.apache.org Betreff: Re: Pending ReadStage is exploding on only one node Yes, it's random partioned. Am 21.11.2011 13:47, schrieb Jahangir Mohammed: Hmm..What's the data distribution like on cluster? R.P.? On Mon, Nov 21, 2011 at 7:31 AM, Johann Höchtl h.hoec...@ic-drei.demailto:h.hoec...@ic-drei.de wrote: I'm using hector-0.8.0-2. No custom load balancer. Hardware is equal on every server. Am 21.11.2011 13:26, schrieb Jahangir Mohammed: I am not so sure from version to version. 1. Which client are you using? Any custom load balancer? 2. Is the hardware on this node any different from other nodes? Thanks, Jahangir. On Mon, Nov 21, 2011 at 5:55 AM, Johann Höchtl h.hoec...@ic-drei.demailto:h.hoec...@ic-drei.de wrote: Hi all, I'm experiencing strange behaviour of my 6-node cassandra cluster and I hope some one can explain, what I'm doing wrong. The setting: 6-Cassandra Nodes 1.0.3 Random Partitioning The ColumnFamily in question has a replication factor of 2 and stores products of different shops with a secondary index on shop_id. Twice a day, I do an update of the data with the following mechanism: Get all keys of a shop. Read the new CSV. Insert the rows from the csv, which keys are not present and delete the rows which are not longer present. Update all prices of the products from the csv and set an update_date. I'm measuring a high load value on a few nodes during the update process (which is normal), but one node keeps the high load after the process for a long time. I checked the tpstats and found out, that on this node there are over 50k pending ReadStage tasks. All the other nodes don't have that behaviour. I already had this problem on cassandra 0.7, but after upgrading to 0.8 it disappeared. Now it is back. Any suggestions? Thanks, Hans -- Mit freundlichen Grüßen, Johann Höchtl stellv. IT-Leiter Adresse Grafinger Straße 6 81671 München Kontakt Web: www.ic3.dehttp://www.ic-drei.de/ E-Mail: h.hoec...@ic-drei.demailto:h.hoec...@ic-drei.de Tel.: 089 638 666 89 - 0 Fax: 089 638 666 89 - 20 [cid:image001.jpg@01CCA9E7.8ED1B7B0]http://www.ic3.de/ Wichtige Hinweise Hinweis: Diese Nachricht kann vertrauliche/rechtlich geschützte Informationen enthalten. Sofern Sie nicht der in dieser Nachricht genannte Adressat (oder ein für die Weiterleitung der Nachricht an den Adressaten Verantwortlicher) sind, ist es Ihnen untersagt, diese Nachricht zu kopieren oder an Dritte weiterzugeben. In diesem Fall löschen Sie bitte diese Nachricht und informieren Sie den Absender dieser Nachricht per Antwort-Nachricht. Die ungenehmigte Nutzung oder Verbreitung dieser Nachricht ganz oder in Teilen ist strengstens untersagt. Bitte beachten Sie ferner, dass E-Mails leicht manipuliert werden können. Daher ist der Inhalt dieser Nachricht nicht rechtlich verbindlich. Der Inhalt dieser Nachricht ist nur rechtsverbindlich, wenn er schriftlich bestätigt wird. IC3 Ltd. kann nicht für die unrichtige oder unvollständige Übermittlung von in dieser Nachricht enthaltenen Informationen, für Verzögerungen beim Erhalt dieser Nachricht oder für Schädigungen Ihrer EDV-Systeme durch diese Nachricht verantwortlich gemacht werden. IC3 Ltd. übernimmt keinerlei Gewähr dafür, dass diese Nachricht nicht verändert wurde und keinerlei Gewähr dafür, dass diese Nachricht nicht von Viren befallen, abgefangen oder in sie anderweitig eingegriffen wurde. Important notice Disclaimer: Privileged/Confidential Informations may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Any unauthorized use or dissemination of this message in whole or in part is strictly prohibited. Please note that e-mails are susceptible to change. The content of this message is therefore not legally binding. The content of this message is only legally binding if confirmed in writing. IC3 Ltd. shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. IC3 Ltd. does not guarantee that the integrity of this communication has been maintained nor that this communication is free of viruses, interceptions or interference. inline: image001.jpg
flushwriter all time blocked
Hi all, On a 0.7.8 cluster In tpstats i can see flushwriter stage having several tasks in state all-time-blocked (immendiatly after node restart its 8 but grows over time to around 300). What does it mean (or how can I find out) and what can I do about it? -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: flushwriter all time blocked
Hi, This still leaves me puzzled. Is it a bad thing? Why is it happening? And what does blocked before being accepted mean? Does it mean Cassandra did not even try to put the task into a queue? Thanks for enlightening me, roland -Ursprüngliche Nachricht- Von: Jonathan Ellis [mailto:jbel...@gmail.com] Gesendet: Montag, 29. August 2011 15:10 An: user@cassandra.apache.org Betreff: Re: flushwriter all time blocked the javadoc for the mbeans explains: /** * Get the number of tasks that had blocked before being accepted (or * rejected). */ public int getTotalBlockedTasks(); /** * Get the number of tasks currently blocked, waiting to be accepted by * the executor (because all threads are busy and the backing queue is full). */ public int getCurrentlyBlockedTasks(); On Mon, Aug 29, 2011 at 3:39 AM, Roland Gude roland.g...@yoochoose.com wrote: Hi all, On a 0.7.8 cluster In tpstats i can see flushwriter stage having several tasks in state all-time-blocked (immendiatly after node restart its 8 but grows over time to around 300). What does it mean (or how can I find out) and what can I do about it? -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.com YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
AW: IndexSliceQuery issue - ReadStage piling up (looks like deadlock/infinite loop or similar)
Yes, i can reproduce this behavior If issue a query like this (on 0.7.8 with patch for CASSANDRA-2964 applied) [default@demo]get users where birth_date = 1968 and state = 'UT'; with an index on birth_date but no index on state I do not get results (actually I get '0 rows') even though there are rows which statisfy all clauses. However, if I repeat this several times several of the nodes start piling up pending reads. (tpstats shows some 8 reads pending) And even though the nodes are not able to fulfill (read) requests anymore, they are not marked as down by the gossiper. Overall this results in an unusable cluster. If I do the same thing on a 0.7.5 cluster Cassandra logs a nullpointerexception and the cli returns with null, but the cluster stays functional. Von: aaron morton [mailto:aa...@thelastpickle.com] Gesendet: Mittwoch, 10. August 2011 23:48 An: user@cassandra.apache.org Betreff: Re: IndexSliceQuery issue - ReadStage piling up (looks like deadlock/infinite loop or similar) Are you still having a problem ? I'm a bit confused about what you saying. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 10 Aug 2011, at 03:33, Roland Gude wrote: Hi, I experience issues when doing a indexslicequery with multiple expressions if one of the expressions is about a non index column I did the equivalent of this example (but with my data) from http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes Secondary indexes automate this. Let's add some state data: [default@demo] set users[bsanderson][state] = 'UT'; [default@demo] set users[prothfuss][state] = 'WI'; [default@demo] set users[htayler][state] = 'UT'; Note that even though state is not indexed yet, we can include the new state data in a query as long as another column in the query is indexed: [default@demo] get users where state = 'UT'; No indexed columns present in index clause with operator EQ [default@demo] get users where state = 'UT' and birth_date 1970; No indexed columns present in index clause with operator EQ [default@demo]get users where birth_date = 1968 and state = 'UT'; --- RowKey: htayler = (column=birth_date, value=1968, timestamp=1291334765649000) = (column=full_name, value=Howard Tayler, timestamp=129133474916) = (column=state, value=5554, timestamp=1291334890708000) On On 0.7.8 (with CASSANDRA-2964 applied) This example will not return any data, but return 0 rows. I repeated the query multiple times with different variations for the values which should all have returned data, but eventually I ended up with the cluster having 8 reads pending on some of the nodes On 0.7.5 the query will result in a NullPointerException being thrown and null returned in the cli ERROR [ReadStage:258] 2011-08-09 16:03:27,153 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:258,5,main] java.lang.RuntimeException: java.lang.NullPointerException at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamily.addAll(ColumnFamily.java:131) at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1615) at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42) ... 4 more ERROR [ReadStage:258] 2011-08-09 16:03:27,153 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:258,5,main] java.lang.RuntimeException: java.lang.NullPointerException at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamily.addAll(ColumnFamily.java:131) at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1615) at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42) ... 4 more Can anybody reproduce this? Greetings, Roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.commailto:roland.g...@yoochoose.com WWW
IndexSliceQuery issue - ReadStage piling up (looks like deadlock/infinite loop or similar)
Hi, I experience issues when doing a indexslicequery with multiple expressions if one of the expressions is about a non index column I did the equivalent of this example (but with my data) from http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes Secondary indexes automate this. Let's add some state data: [default@demo] set users[bsanderson][state] = 'UT'; [default@demo] set users[prothfuss][state] = 'WI'; [default@demo] set users[htayler][state] = 'UT'; Note that even though state is not indexed yet, we can include the new state data in a query as long as another column in the query is indexed: [default@demo] get users where state = 'UT'; No indexed columns present in index clause with operator EQ [default@demo] get users where state = 'UT' and birth_date 1970; No indexed columns present in index clause with operator EQ [default@demo]get users where birth_date = 1968 and state = 'UT'; --- RowKey: htayler = (column=birth_date, value=1968, timestamp=1291334765649000) = (column=full_name, value=Howard Tayler, timestamp=129133474916) = (column=state, value=5554, timestamp=1291334890708000) On On 0.7.8 (with CASSANDRA-2964 applied) This example will not return any data, but return 0 rows. I repeated the query multiple times with different variations for the values which should all have returned data, but eventually I ended up with the cluster having 8 reads pending on some of the nodes On 0.7.5 the query will result in a NullPointerException being thrown and null returned in the cli ERROR [ReadStage:258] 2011-08-09 16:03:27,153 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:258,5,main] java.lang.RuntimeException: java.lang.NullPointerException at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamily.addAll(ColumnFamily.java:131) at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1615) at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42) ... 4 more ERROR [ReadStage:258] 2011-08-09 16:03:27,153 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:258,5,main] java.lang.RuntimeException: java.lang.NullPointerException at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:51) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.cassandra.db.ColumnFamily.addAll(ColumnFamily.java:131) at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1615) at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42) ... 4 more Can anybody reproduce this? Greetings, Roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: results of index slice query
Hi, I have so far not been able to reproduce this bug on any other cluster than our production cluster which started with the behavior only after the upgrade from 0.7.5 to 0.7.7 I have attached logs to the issue but I have absolutely no clue how to move forward. Any ideas anybody? -Ursprüngliche Nachricht- Von: Roland Gude [mailto:roland.g...@yoochoose.com] Gesendet: Donnerstag, 28. Juli 2011 11:22 An: user@cassandra.apache.org Betreff: AW: results of index slice query Created https://issues.apache.org/jira/browse/CASSANDRA-2964 -Ursprüngliche Nachricht- Von: Jonathan Ellis [mailto:jbel...@gmail.com] Gesendet: Mittwoch, 27. Juli 2011 17:35 An: user@cassandra.apache.org Betreff: Re: results of index slice query Sounds like a Cassandra bug to me. On Wed, Jul 27, 2011 at 6:44 AM, Roland Gude roland.g...@yoochoose.com wrote: Hi, I was just experiencing that when i do an IndexSliceQuery with the index column not in the slicerange the index column will be returned anyways. Is this behavior intended or is it a bug (if so - is it a Cassandra bug or a hector bug)? I am using Cassandra 0.7.7 and hector 0.7-26 Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.com YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
AW: results of index slice query
Created https://issues.apache.org/jira/browse/CASSANDRA-2964 -Ursprüngliche Nachricht- Von: Jonathan Ellis [mailto:jbel...@gmail.com] Gesendet: Mittwoch, 27. Juli 2011 17:35 An: user@cassandra.apache.org Betreff: Re: results of index slice query Sounds like a Cassandra bug to me. On Wed, Jul 27, 2011 at 6:44 AM, Roland Gude roland.g...@yoochoose.com wrote: Hi, I was just experiencing that when i do an IndexSliceQuery with the index column not in the slicerange the index column will be returned anyways. Is this behavior intended or is it a bug (if so - is it a Cassandra bug or a hector bug)? I am using Cassandra 0.7.7 and hector 0.7-26 Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.com YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
results of index slice query
Hi, I was just experiencing that when i do an IndexSliceQuery with the index column not in the slicerange the index column will be returned anyways. Is this behavior intended or is it a bug (if so - is it a Cassandra bug or a hector bug)? I am using Cassandra 0.7.7 and hector 0.7-26 Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: Multi-type column values in single CF
You could do the serialization for all your supported datatypes yourself (many libraries for serialization are available and a pretty thorough benchmarking for them can be found here: https://github.com/eishay/jvm-serializers/wiki) and prepend the serialized bytes with an identifier for your datatype. This would not avoid casting though but would still be better performing then serializing to strings as it is done in your example. Prepending the values with the id seems to be better to me, because you can be sure that a new insertion to some field overwrites the correct column even if it changed the type. -Ursprüngliche Nachricht- Von: osishkin osishkin [mailto:osish...@gmail.com] Gesendet: Sonntag, 3. Juli 2011 13:52 An: user@cassandra.apache.org Betreff: Multi-type column values in single CF Hi all, I need to store column values that are of various data types in a single column family, i.e I have column values that are integers, others that are strings, and maybe more later. All column names are strings (no comparator problem for me). The thing is I need to store unstructured data - I do not have fixed and known-in-advacne column names, so I can not use a fixed static map for casting the values back to their original type on retrieval from cassandra. My immediate naive thought is to simply prefix every column name with the type the value needs to be cast back to. For example i'll do the follwing conversion to the columns of some key - {'attr1': 'val1','attr2': 100} ~ {'str_attr1' : 'val1', 'int_attr2' : '100'} and only then send it to cassandra. This way I know to what should I cast it back. But all this casting back and forth on the client side seems to me to be very bad for performance. Another option is to split the columns on dedicated column families with mathcing validation types - a column family for integer values, one for string, one for timestamp etc. But that does not seem very efficient either (and worse for any rollback mechanism), since now I have to perform several get calls on multiple CFs where once I had only one. I thought perhaps someone has encountered a similar situation in the past, and can offer some advice on the best course of action. Thank you, Osi
AW: Column value type
There is a comparator type (fort he name) and a validation type (for the value) If you have set the validation to be UTF8 you can only store data that is valid UTF8 there. The default validation is BytesType so it should accept everything unless otherwise specified. I cannot tell anything regarding pycassa client side validation though. -Ursprüngliche Nachricht- Von: osishkin osishkin [mailto:osish...@gmail.com] Gesendet: Mittwoch, 22. Juni 2011 13:14 An: user@cassandra.apache.org Betreff: Column value type Is there a limitation on the data type of a column value (not column name) in cassandra? I'm saving data using a pycassa client, for a UTF8 column family, and I get an error when I try saving integer data values. Only when convert the values to string can I save the data. Looking at the pycassa code it seems to prevent me from sending non-string data. It doesn't make sense to me, since as far as I understood things, the type should apply only to column names (for comparison etc.). Am I wrong? Thank you
Re: range query vs slice range query
I cannot Display the Book page you are referring to, but your General understanding is correct. A Range Refers to several rows, a slice Refers to several columns. A RangeSlice is a combination of Both. From all rows in a Range get a specific slice of columns. Am 25.05.2011 um 10:43 schrieb david lee iecan...@gmail.commailto:iecan...@gmail.com: hi guys, i'm reading up on the book Cassandra - Definitive guide and i don't seem to understand what it says about ranges and slices my understanding is a range as in a mathematical range to define a subset from an ordered set of elements, in cassandra typically means a range of rows whereas a slice means a range of columns. a range query refers to a query to retrieve a range of rows whereas a slice range queyr refers to a query to retrieve range of columns within a row. i may be talking about total nonsense but i really am more confused after reading this portion of the book http://books.google.com/books?id=MKGSbCbEdg0Cpg=PA134lpg=PA134dq=cassandra+%22range+query%22+%22range+slice%22source=blots=XoPB4uA60usig=uDDoQe0FRkQobHnr-vPvvQ3B8TQhl=enei=ub3cTcvGLZLevQOuxs3CDwsa=Xoi=book_resultct=resultresnum=4ved=0CCwQ6AEwAw#v=onepageq=cassandra%20%22range%20query%22%20%22range%20slice%22f=falsehttp://books.google.com/books?id=MKGSbCbEdg0Cpg=PA134lpg=PA134dq=cassandra+%22range+query%22+%22range+slice%22source=blots=XoPB4uA60usig=uDDoQe0FRkQobHnr-vPvvQ3B8TQhl=enei=ub3cTcvGLZLevQOuxs3CDwsa=Xoi=book_resultct=resultresnum=4ved=0CCwQ6AEwAw#v=onepageq=cassandra%20%22range%20query%22%20%22range%20slice%22f=false many thanx in advance david
Re: range query vs slice range query
That is correct. Random partitioner orders rows according to the MD5 sum. Am 25.05.2011 um 16:11 schrieb Robert Jackson robe...@promedicalinc.commailto:robe...@promedicalinc.com: Also, it is my understanding that if you are not using OrderPreservingPartitioner a get_range_slices may not return what you would expect. With the RandomPartitioner you can iterate over the complete list by using the last row key as the start for subsequent requests, but if you are using a single query you will be returned all the rows where the returned row key's md5 is between the md5 of the start row key and stop row key. Reference: http://wiki.apache.org/cassandra/FAQ - Why aren't range slices/sequential scans giving me the expected results? Robert Jackson From: Jonathan Ellis jbel...@gmail.commailto:jbel...@gmail.com To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: Wednesday, May 25, 2011 8:54:34 AM Subject: Re: range query vs slice range query get_range_slices is the api to get a slice (of columns) from each of a range (of rows) On Wed, May 25, 2011 at 3:42 AM, david lee iecan...@gmail.com wrote: hi guys, i'm reading up on the book Cassandra - Definitive guide and i don't seem to understand what it says about ranges and slices my understanding is a range as in a mathematical range to define a subset from an ordered set of elements, in cassandra typically means a range of rows whereas a slice means a range of columns. a range query refers to a query to retrieve a range of rows whereas a slice range queyr refers to a query to retrieve range of columns within a row. i may be talking about total nonsense but i really am more confused after reading this portion of the book http://books.google.com/books?id=MKGSbCbEdg0Cpg=PA134lpg=PA134dq=cassandra+%22range+query%22+%22range+slice%22source=blots=XoPB4uA60usig=uDDoQe0FRkQobHnr-vPvvQ3B8TQhl=enei=ub3cTcvGLZLevQOuxs3CDwsa=Xoi=book_resultct=resultresnum=4ved=0CCwQ6AEwAw#v=onepageq=cassandra%20%22range%20query%22%20%22range%20slice%22f=false many thanx in advance david -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
AW: Does anyone have Cassandra running on OpenSolaris?
Use bash as a shell #bash bin/cassandra -f -Ursprüngliche Nachricht- Von: Jeffrey Kesselman [mailto:jef...@gmail.com] Gesendet: Montag, 9. Mai 2011 17:12 An: user@cassandra.apache.org Betreff: Does anyone have Cassandra running on OpenSolaris? I get this error: bin/cassandra: syntax error at line 29: `system_memory_in_mb=$' unexpected Thanks JK -- It's always darkest just before you are eaten by a grue.
Re: low performance inserting
Hi, Not sure this is the case for your Bad Performance, but you are Meassuring Data creation and Insertion together. Your Data creation involves Lots of class casts which are probably quite Slow. Try Timing only the b.send Part and See how Long that Takes. Roland Am 03.05.2011 um 12:30 schrieb charles THIBAULT charl.thiba...@gmail.com: Hello everybody, first: sorry for my english in advance!! I'm getting started with Cassandra on a 5 nodes cluster inserting data with the pycassa API. I've read everywere on internet that cassandra's performance are better than MySQL because of the writes append's only into commit logs files. When i'm trying to insert 100 000 rows with 10 columns per row with batch insert, I'v this result: 27 seconds But with MySQL (load data infile) this take only 2 seconds (using indexes) Here my configuration cassandra version: 0.7.5 nodes : 192.168.1.210, 192.168.1.211, 192.168.1.212, 192.168.1.213, 192.168.1.214 seed: 192.168.1.210 My script * #!/usr/bin/env python import pycassa import time import random from cassandra import ttypes pool = pycassa.connect('test', ['192.168.1.210:9160']) cf = pycassa.ColumnFamily(pool, 'test') b = cf.batch(queue_size=50, write_consistency_level=ttypes.ConsistencyLevel.ANY) tps1 = time.time() for i in range(10): columns = dict() for j in range(10): columns[str(j)] = str(random.randint(0,100)) b.insert(str(i), columns) b.send() tps2 = time.time() print(execution time: + str(tps2 - tps1) + seconds) * what I'm doing rong ?
AW: AW: Two versions of schema
Yeah it happens from time to time even if everything seems to be fine that schema changes don't work correctly. But it's always repairable with the described procedure. Therefore the operator being available is a must have I think. Drain is a nodetool command. The node flushes data and stops accepting new writes. This just speeds up bringing the node back up again in this case. Probably a flush is equally acceptable. -Ursprüngliche Nachricht- Von: mcasandra [mailto:mohitanch...@gmail.com] Gesendet: Montag, 18. April 2011 18:27 An: cassandra-u...@incubator.apache.org Betreff: Re: AW: Two versions of schema In my case all hosts were reachable and I ran nodetool ring before running the schema update. I don't think it was because of node being down. I tihnk for some reason it just took over 10 secs because I was reducing key_cache from 1M to 1000. I think it might be taking long to trim the keys hence 10 sec default may not be the right way. What is drain? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Two-versions-of-schema-tp6277365p6284276.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Site Not Surviving a Single Cassandra Node Crash
Not sure about that Hector Version, but there was a Hector Bug that Hector did Not stop using a Dead Node As Proxy and that it did not do proper Load balancing in the requests. If you enable trace Logs for Hector you can See which nodes it uses for requests. If there is a newer 0.6 Hector you should give it a try. Furthermore i Suggest Brunhild down One Node and request data with the cli. If that Works it is probably the Hector bug. Am 10.04.2011 um 06:57 schrieb Patricio Echagüe patric...@gmail.commailto:patric...@gmail.com: What is the consistency level you are using ? And as Ed said, if you can provide the stacktrace that would help too. On Sat, Apr 9, 2011 at 7:02 PM, aaron morton mailto:aa...@thelastpickle.comaa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: btw, the nodes are a tad out of balance was that deliberate ? http://wiki.apache.org/cassandra/Operations#Token_selectionhttp://wiki.apache.org/cassandra/Operations#Token_selection http://wiki.apache.org/cassandra/Operations#Load_balancinghttp://wiki.apache.org/cassandra/Operations#Load_balancing Aaron On 10 Apr 2011, at 08:44, Ed Anuff wrote: Sounds like the problem might be on the hector side. Lots of hector users on this list, but usually not a bad idea to ask on mailto:hector-us...@googlegroups.comhector-us...@googlegroups.commailto:hector-us...@googlegroups.com (cc'd). The jetty servers stopping responding is a bit vague, somewhere in your logs is an error message that should shed some light on where things are going awry. If you can find the exception that's being thrown in hector and post that, it'd make it much easier to help you out. Ed On Sat, Apr 9, 2011 at 12:11 PM, Vram Kouramajian mailto:vram.kouramaj...@gmail.comvram.kouramaj...@gmail.commailto:vram.kouramaj...@gmail.com wrote: The hector clients are used as part of our jetty servers. And, the jetty servers stop responding when one of the Cassandra nodes go down. Vram On Sat, Apr 9, 2011 at 11:54 AM, Joe Stump mailto:j...@joestump.netj...@joestump.netmailto:j...@joestump.net wrote: Did the Cassandra cluster go down or did you start getting failures from the client when it routed queries to the downed node? The key in the client is to keep working around the ring if the initial node is down. --Joe On Apr 9, 2011, at 12:52 PM, Vram Kouramajian wrote: We have a 5 Cassandra nodes with the following configuration: Casandra Version: 0.6.11 Number of Nodes: 5 Replication Factor: 3 Client: Hector 0.6.0-14 Write Consistency Level: Quorum Read Consistency Level: Quorum Ring Topology: OwnsRange Ring 132756707369141912386052673276321963528 192.168.89.153Up 4.15 GB 33.87% 20237398133070283622632741498697119875 |--| 192.168.89.155Up 5.17 GB 18.29% 51358066040236348437506517944084891398 | ^ 192.168.89.154Up 7.41 GB 33.97% 109158969152851862753910401160326064203v | 192.168.89.152Up 5.07 GB 6.34% 119944993359936402983569623214763193674| ^ 192.168.89.151Up 4.22 GB 7.53% 132756707369141912386052673276321963528|--| We believe that our setup should survive the crash of one of the Cassandra nodes. But, we had few crashes and the system stopped functioning until we brought back the Cassandra nodes. Any clues? Vram
Re: Atomicity Strategies
A Strategy that should Cover at least some use Cases is roughly like this: Given cf A and B should Be in Sync In write 'a' to cf A Add another Column 'Synchronisation_token' and Write a tuuid 'T' (or a timestamp or some Otter Value that Allows (Time based) ordering) As its value. On the related write to cfB Write the Token As well. When Reading check Client Side if tokens Match and reread Data with Lower Token until it does. Roland Am 10.04.2011 um 03:53 scaaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com: My understanding of what they did with locking (based on the examples) was to achieve a level of transaction isolation http://en.wikipedia.org/wiki/Isolation_(database_systems) http://en.wikipedia.org/wiki/Isolation_(database_systems) I think the issue here is more about atomicity http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomicWe cannot guarantee that all or none of the mutations in your batch are completed. There is some work in this area though https://issues.apache.org/jira/browse/CASSANDRA-1684 https://issues.apache.org/jira/browse/CASSANDRA-1684 https://issues.apache.org/jira/browse/CASSANDRA-1684AFAIK the best approach now is to work at Quourm, and write your code to handle missing relations. Also cassandra does do a lot of work upfront before the write starts to ensure it will succeed, failures during a write will probably be due to a SW/HW failure or overload on a node that gossip has not picked up. Retrying is the recommended approach when a request fails. Hope that helps. Aaron On 9 Apr 2011, at 15:58, Dan Washusen wrote: Here's a good writeup on how http://www.fightmymonster.com/ fightmymonster.comhttp://fightmymonster.com does it... http://ria101.wordpress.com/category/nosql-databases/locking/http://ria101.wordpress.com/category/nosql-databases/locking/ -- Dan Washusen Make big files fly visit http://digitalpigeon.com/ digitalpigeon.comhttp://digitalpigeon.com On Saturday, 9 April 2011 at 11:53 AM, Alex Araujo wrote: On 4/8/11 5:46 PM, Drew Kutcharian wrote: I'm interested in this too, but I don't think this can be done with Cassandra alone. Cassandra doesn't support transactions. I think hector can retry operations, but I'm not sure about the atomicity of the whole thing. On Apr 8, 2011, at 1:26 PM, Alex Araujo wrote: Hi, I was wondering if there are any patterns/best practices for creating atomic units of work when dealing with several column families and their inverted indices. For example, if I have Users and Groups column families and did something like: Users.insert( user_id, columns ) UserGroupTimeline.insert( group_id, { timeuuid() : user_id } ) UserGroupStatus.insert( group_id + : + user_id, { Active : True } ) UserEvents.insert( timeuuid(), { user_id : user_id, group_id : group_id, event_type : join } ) Would I want the client to retry all subsequent operations that failed against other nodes after n succeeded, maintain an undo queue of operations to run, batch the mutations and choose a strong consistency level, some combination of these/others, etc? Thanks, Alex Thanks Drew. I'm familiar with lack of transactions and have read about people usiing ZK (possibly Cages as well?) to accomplish this, but since it seems that inverted indices are common place I'm interested in how anyone is mitigating lack of atomicity to any extent without the use of such tools. It appears that Hector and Pelops have retrying built in to their APIs and I'm fairly confident that proper use of those capabilities may help. Just trying to cover all bases. Hopefully someone can share their approaches and/or experiences. Cheers, Alex.
Re: Secondary Index keeping track of column names
You could simulate it thoug. Just Add some Meta Column with a boolean Value indicating if the referred Column is in the Row or Not. Then Add an Index in that Meta Column and query for it. I. E. Row a: (c=1234),(has_c=Yes) Quert : List cf where has_c=Yes Am 06.04.2011 um 18:52 schrieb Jonathan Ellis jbel...@gmail.com: No, 0.7 indexes handle equality queries; you're basically asking for a IS NOT NULL query. On Wed, Apr 6, 2011 at 11:23 AM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: In 0.7.X is there a way to have an automatic secondary index which keeps track of what keys contain a certain column? Right now we are keeping track of this manually, so we can quickly get all of the rows which contain a given column, it would be nice if it was automatic. -Jeremiah Jeremiah Jordan Application Developer Morningstar, Inc. Morningstar. Illuminating investing worldwide. +1 312 696-6128 voice jeremiah.jor...@morningstar.com www.morningstar.com This e-mail contains privileged and confidential information and is intended only for the use of the person(s) named above. Any dissemination, distribution, or duplication of this communication without prior written consent from Morningstar is strictly prohibited. If you have received this message in error, please contact the sender immediately and delete the materials from any computer. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
AW: Strange nodetool repair behaviour
I am experiencing the same behavior but had it on previous versions of 0.7 as well. -Ursprüngliche Nachricht- Von: Jonas Borgström [mailto:jonas.borgst...@trioptima.com] Gesendet: Montag, 4. April 2011 12:26 An: user@cassandra.apache.org Betreff: Strange nodetool repair behaviour Hi, I have a 6 node 0.7.4 cluster with replication_factor=3 where nodetool repair keyspace behaves really strange. The keyspace contains three column families and about 60GB data in total (i.e 30GB on each node). Even though no data has been added or deleted since the last repair, a repair takes hours and the repairing node seems to receive 100+GB worth of sstable data from its neighbourhood nodes, i.e several times the actual data size. The log says things like: Performing streaming repair of 27 ranges And a bunch of: Compacted to filename 22,208,983,964 to 4,816,514,033 (~21% of original) In the end the repair finishes without any error after a few hours but even then the active sstables seems to contain lots of redundant data since the disk usage can be sliced in half by triggering a major compaction. All this leads me to believe that something stops the AES from correctly figuring out what data is already on the repairing node and what needs to be streamed from the neighbours. The only thing I can think of right now is that one of the column families contains a lot of large rows that are larger than memtable_throughput and that's perhaps what's confusing the merkle tree. Anyway, is this a known problem of perhaps expected behaviour? Otherwise I'll try to create a more reproducible test case. Regards, Jonas
AW: too many open files - maybe a fd leak in indexslicequeries
Hi, The open file limit is 1024 Sstable count is somewhere around 20 or so thread count is in the same order of magnitude I guess But lsof shows that deleted sstables still have open file handles. This seems to be the issue as this number keeps growing. Any ideas? Roland. -Ursprüngliche Nachricht- Von: Jonathan Ellis [mailto:jbel...@gmail.com] Gesendet: Freitag, 1. April 2011 06:07 An: user@cassandra.apache.org Cc: Roland Gude; Juergen Link; Johannes Hoerle Betreff: Re: too many open files - maybe a fd leak in indexslicequeries Index queries (ColumnFamilyStore.scan) don't do any low-level i/o themselves, they go through CFS.getColumnFamily, which is what normal row fetches also go through. So if there is a leak there it's unlikely to be specific to indexes. What is your open-file limit (remember that sockets count towards this), thread count, sstable count? On Thu, Mar 31, 2011 at 4:15 PM, Roland Gude roland.g...@yoochoose.com wrote: I experience something that looks exactly like https://issues.apache.org/jira/browse/CASSANDRA-1178 On cassandra 0.7.3 when using index slice queries (lots of them) Crashing multiple nodes and rendering the cluster useless. But I have no clue where to look if index queries still leak fd Does anybody know about it? Where could I look? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.com YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
too many open files - maybe a fd leak in indexslicequeries
I experience something that looks exactly like https://issues.apache.org/jira/browse/CASSANDRA-1178 On cassandra 0.7.3 when using index slice queries (lots of them) Crashing multiple nodes and rendering the cluster useless. But I have no clue where to look if index queries still leak fd Does anybody know about it? Where could I look? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: problems while TimeUUIDType-index-querying with two expressions
Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists. As far as I can see, there is nothing wrong with the IndexExpression. using two Index expressions with key=TimedUUID and Value=anything does not work using one index expression (any one of the other two) alone does work fine. I refactored Johannes code into a junit testcase. It needs the cluster configured as described in Johannes mail. There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs. Bye, roland Von: aaron morton [mailto:aa...@thelastpickle.com] Gesendet: Dienstag, 15. März 2011 07:54 An: user@cassandra.apache.org Cc: Juergen Link; Roland Gude; her...@datastax.com Betreff: Re: problems while TimeUUIDType-index-querying with two expressions Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328 Aaron On 15 Mar 2011, at 16:52, Jonathan Ellis wrote: Sounds like we should send an InvalidRequestException then. On Mon, Mar 14, 2011 at 8:06 PM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: It's failing to when comparing two TimeUUID values because on of them is not properly formatted. In this case it's comparing a stored value with the value passed in the get_indexed_slice() query expression. I'm going to assume it's the value passed for the expression. When you create the IndexedSlicesQuery this is incorrect IndexedSlicesQueryString, byte[], byte[] indexQuery = HFactory .createIndexedSlicesQuery(keyspace, stringSerializer, bytesSerializer, bytesSerializer); Use a UUIDSerializer for the last param and then pass the UUID you want to build the expressing. Rather than the string/byte thing you are passing Hope that helps. Aaron On 15 Mar 2011, at 04:17, Johannes Hoerle wrote: Hi all, in order to improve our queries, we started to use IndexedSliceQueries from the hector project (https://github.com/zznate/hector-examples). I followed the instructions for creating IndexedSlicesQuery with GetIndexedSlices.java. I created the corresponding CF with in a keyspace called Keyspace1 ( create keyspace Keyspace1;) with: create column family Indexed1 with column_type='Standard' and comparator='UTF8Type' and keys_cached=20 and read_repair_chance=1.0 and rows_cached=2 and column_metadata=[{column_name: birthdate, validation_class: LongType, index_name: dateIndex, index_type: KEYS},{column_name: birthmonth, validation_class: LongType, index_name: monthIndex, index_type: KEYS}]; and the example GetIndexedSlices.java worked fine. Output of CF Indexed1: --- [default@Keyspace1] list Indexed1; Using default limit of 100 --- RowKey: fake_key_12 = (column=birthdate, value=1974, timestamp=1300110485826059) = (column=birthmonth, value=0, timestamp=1300110485826060) = (column=fake_column_0, value=66616b655f76616c75655f305f3132, timestamp=1300110485826056) = (column=fake_column_1, value=66616b655f76616c75655f315f3132, timestamp=1300110485826057) = (column=fake_column_2, value=66616b655f76616c75655f325f3132, timestamp=1300110485826058) --- RowKey: fake_key_8 = (column=birthdate, value=1974, timestamp=1300110485826039) = (column=birthmonth, value=8, timestamp=1300110485826040) = (column=fake_column_0, value=66616b655f76616c75655f305f38, timestamp=1300110485826036) = (column=fake_column_1, value=66616b655f76616c75655f315f38, timestamp=1300110485826037) = (column=fake_column_2, value=66616b655f76616c75655f325f38, timestamp=1300110485826038) --- Now to the problem: As we have another column format in our cluster (using TimeUUIDType as comparator in CF definition) I adapted the application to our schema on a cassandra-0.7.3 cluster. We use a manually defined UUID for a mandator id index (--1000--) and another one for a userid index (0001--1000--). It can be created with: create column family ByUser with column_type='Standard' and comparator='TimeUUIDType' and keys_cached=20 and read_repair_chance=1.0 and rows_cached=2 and column_metadata=[{column_name: --1000--, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 0001--1000--, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}]; which looks in the cluster using cassandra-cli like this: [default@Keyspace1] describe keyspace; Keyspace: Keyspace1: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: ByUser Columns sorted
Re: cant seem to figure out secondary index definition
Yes, It has such a Clause. I am very certain that this is Not my Code because the very Same program Works against a Cluster of the Index is created with the cli and it does not, when the Index is configured with cassandra.yaml My assumption is, that the Index Kreation with configured file is flawed (it dös Not Seem to use the Same Code As configuration Parameters are named differently) i suspekt it to create the Index for the wrong Column. Greetings, Roland. Am 17.r.2011 um 21:46 schrieb Nate McCall n...@datastax.com: How are you constructing the IndexSlicesQuery? Does it have an equals clause with that UUID as the column name? On Thu, Feb 17, 2011 at 11:32 AM, Roland Gude roland.g...@yoochoose.com wrote: Hi again, i am still having trouble with this. If I define the index using cli with these commands: create column family A with column_type='Standard' and comparator='TimeUUIDType' and keys_cached=20 and read_repair_chance=1.0 and rows_cached=0.0 and column_metadata=[{column_name: --1000--, validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}]; create column family B with column_type='Standard' and comparator='TimeUUIDType' and keys_cached=20 and read_repair_chance=1.0 and rows_cached=0.0 and column_metadata=[{column_name: --1000--, validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}]; I can do IndexedSliceQueries as expected In my unit tests where I use an embedded Cassandra instance configured via yaml like this: - column_metadata: [{name: --1000--, validator_class: UTF8Type, index_name: MyIndex, index_type: KEYS}] compare_with: TimeUUIDType gc_grace_seconds: 864000 keys_cached: 0.0 max_compaction_threshold: 32 min_compaction_threshold: 4 name: A read_repair_chance: 1.0 rows_cached: 0.0 - column_metadata: [{name: --1000--, validator_class: UTF8Type, index_name: MyIndex, index_type: KEYS}] compare_with: TimeUUIDType gc_grace_seconds: 864000 keys_cached: 0.0 max_compaction_threshold: 32 min_compaction_threshold: 4 name: B read_repair_chance: 1.0 rows_cached: 0.0 I get these Exceptions: 18:23:55.973 [CassandraDataFetcher-queries] ERROR c.y.s.c.i.event.CassandraDataFetcher - Query me.prettyprint.cassandra.model.IndexedSlicesQuery@1bbd3e2 failed, stop query. me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:No indexed columns present in index clause with operator EQ) at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$12.execute(KeyspaceServiceImpl.java:513) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$12.execute(KeyspaceServiceImpl.java:495) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:161) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getIndexedSlices(KeyspaceServiceImpl.java:517) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.model.IndexedSlicesQuery$1.doInKeyspace(IndexedSlicesQuery.java:140) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.model.IndexedSlicesQuery$1.doInKeyspace(IndexedSlicesQuery.java:131) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) ~[hector-core-0.7.0-26.jar:na] at me.prettyprint.cassandra.model.IndexedSlicesQuery.execute(IndexedSlicesQuery.java:130) ~[hector-core-0.7.0-26.jar:na] at com.yoochoose.services.cassandra.internal.event.CassandraDataFetcher$1.onMessage(CassandraDataFetcher.java:60) [classes/:na] at com.yoochoose.services.cassandra.internal.event.CassandraDataFetcher$1.onMessage(CassandraDataFetcher.java:47) [classes/:na] at org.jetlang.channels.ChannelSubscription$1.run(ChannelSubscription.java:31) [jetlang-0.2.1.jar:na] at org.jetlang.core.BatchExecutorImpl.execute(BatchExecutorImpl.java:11) [jetlang-0.2.1.jar:na] at org.jetlang.core.RunnableExecutorImpl.run(RunnableExecutorImpl.java:34) [jetlang-0.2.1
AW: rename index
Thanks, Up to now i could not see any problems with the index names For now I will not touch it. If I encounter something I’ll let you know Von: Aaron Morton [mailto:aa...@thelastpickle.com] Gesendet: Mittwoch, 16. Februar 2011 21:00 An: user@cassandra.apache.org Betreff: Re: rename index There is no rename, but update column family though the cli or api with just the renamed index should work. The code says it will remove old and add new indexes based on their name. I'm not sure if the name is used for anything other than identifying the index inside the CF. Are the duplicate names causing a problem? Aaron On 17/02/2011, at 6:15 AM, Roland Gude roland.g...@yoochoose.commailto:roland.g...@yoochoose.com wrote: Hi, unfortiunately i made a copy paste error and created two indexes called “myindex” on different columnfamilies. What can I do to fix this? Below the output from describe keyspace ColumnFamily: A Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/14400 Memtable thresholds: 1.1203125/239/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Column Metadata: Column Name: --1000-- Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Name: MyIndex Index Type: KEYS ColumnFamily: B Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/14400 Memtable thresholds: 1.1203125/239/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Column Metadata: Column Name: --1000-- Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Name: MyIndex Index Type: KEYS -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.commailto:roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: cant seem to figure out secondary index definition
.execute(KeyspaceServiceImpl.java:501) ~[hector-core-0.7.0-26.jar:na] ... 19 common frames omitted With the very same code and data. I assume that the column name I give in Cassandra.yaml is somehow not inmterpreted as a TimedUUID or something. Any help would be greatly appreciated Greetings, roland Von: Michal Augustýn [mailto:augustyn.mic...@gmail.com] Gesendet: Dienstag, 15. Februar 2011 16:22 An: user@cassandra.apache.org Betreff: Re: cant seem to figure out secondary index definition Ah, ok. I checked that in source and the problem is that you wrote validation_class but you should validator_class. Augi 2011/2/15 Roland Gude roland.g...@yoochoose.commailto:roland.g...@yoochoose.com Yeah i know about that, but the definition i have is for a cluster that is started/stopped from a unit test with hector embeddedServerHelper, which takes definitions from the yaml. So i'd still like to define the index in the yaml file (it should very well be possible I guess) Von: Michal Augustýn [mailto:augustyn.mic...@gmail.commailto:augustyn.mic...@gmail.com] Gesendet: Dienstag, 15. Februar 2011 15:53 An: user@cassandra.apache.orgmailto:user@cassandra.apache.org Betreff: Re: cant seem to figure out secondary index definition Hi, if you download Cassandra and look into conf/cassandra.yaml then you can see this: this keyspace definition is for demonstration purposes only. Cassandra will not load these definitions during startup. See http://wiki.apache.org/cassandra/FAQ#no_keyspaces for an explanation. So you should make all schema-related operation via Thrift/AVRO API, or you can use Cassandra CLI. Augi 2011/2/15 Roland Gude roland.g...@yoochoose.commailto:roland.g...@yoochoose.com Hi, i am a little puzzled on creation of secondary indexes and the docs in that area are still very sparse. What I am trying to do is - in a columnfamily with TimeUUID comparator, I want the special timeuuid --1000-- to be indexed. The value being some UTF8 string on which I want to perform equality checks. What do I need to put in my cassandra.yaml file? Something like this? - column_metadata: [{name: --1000--, validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}] This gives me that error: 15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor - Fatal error: null; Can't construct a java object for tag:yaml.orghttp://yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=keyspaces for JavaBean=org.apache.cassandra.config.Config@7eb6e2; Cannot create property=column_families for JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create property=column_metadata for JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create property=validation_class for JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to find property 'validation_class' on class: org.apache.cassandra.config.RawColumnDefinition Bad configuration; unable to start server I am furthermor uncertain if the column name will be correctly used if given like this. Should I put the byte representation of the uuid there? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.commailto:roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
rename index
Hi, unfortiunately i made a copy paste error and created two indexes called myindex on different columnfamilies. What can I do to fix this? Below the output from describe keyspace ColumnFamily: A Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/14400 Memtable thresholds: 1.1203125/239/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Column Metadata: Column Name: --1000-- Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Name: MyIndex Index Type: KEYS ColumnFamily: B Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/14400 Memtable thresholds: 1.1203125/239/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Column Metadata: Column Name: --1000-- Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Name: MyIndex Index Type: KEYS -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
cant seem to figure out secondary index definition
Hi, i am a little puzzled on creation of secondary indexes and the docs in that area are still very sparse. What I am trying to do is - in a columnfamily with TimeUUID comparator, I want the special timeuuid --1000-- to be indexed. The value being some UTF8 string on which I want to perform equality checks. What do I need to put in my cassandra.yaml file? Something like this? - column_metadata: [{name: --1000--, validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}] This gives me that error: 15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor - Fatal error: null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=keyspaces for JavaBean=org.apache.cassandra.config.Config@7eb6e2; Cannot create property=column_families for JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create property=column_metadata for JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create property=validation_class for JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to find property 'validation_class' on class: org.apache.cassandra.config.RawColumnDefinition Bad configuration; unable to start server I am furthermor uncertain if the column name will be correctly used if given like this. Should I put the byte representation of the uuid there? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: cant seem to figure out secondary index definition
Yeah i know about that, but the definition i have is for a cluster that is started/stopped from a unit test with hector embeddedServerHelper, which takes definitions from the yaml. So i'd still like to define the index in the yaml file (it should very well be possible I guess) Von: Michal Augustýn [mailto:augustyn.mic...@gmail.com] Gesendet: Dienstag, 15. Februar 2011 15:53 An: user@cassandra.apache.org Betreff: Re: cant seem to figure out secondary index definition Hi, if you download Cassandra and look into conf/cassandra.yaml then you can see this: this keyspace definition is for demonstration purposes only. Cassandra will not load these definitions during startup. See http://wiki.apache.org/cassandra/FAQ#no_keyspaces for an explanation. So you should make all schema-related operation via Thrift/AVRO API, or you can use Cassandra CLI. Augi 2011/2/15 Roland Gude roland.g...@yoochoose.commailto:roland.g...@yoochoose.com Hi, i am a little puzzled on creation of secondary indexes and the docs in that area are still very sparse. What I am trying to do is - in a columnfamily with TimeUUID comparator, I want the special timeuuid --1000-- to be indexed. The value being some UTF8 string on which I want to perform equality checks. What do I need to put in my cassandra.yaml file? Something like this? - column_metadata: [{name: --1000--, validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}] This gives me that error: 15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor - Fatal error: null; Can't construct a java object for tag:yaml.orghttp://yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=keyspaces for JavaBean=org.apache.cassandra.config.Config@7eb6e2; Cannot create property=column_families for JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create property=column_metadata for JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create property=validation_class for JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to find property 'validation_class' on class: org.apache.cassandra.config.RawColumnDefinition Bad configuration; unable to start server I am furthermor uncertain if the column name will be correctly used if given like this. Should I put the byte representation of the uuid there? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.commailto:roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: cant seem to figure out secondary index definition
Thanks, it works. roland Von: Michal Augustýn [mailto:augustyn.mic...@gmail.com] Gesendet: Dienstag, 15. Februar 2011 16:22 An: user@cassandra.apache.org Betreff: Re: cant seem to figure out secondary index definition Ah, ok. I checked that in source and the problem is that you wrote validation_class but you should validator_class. Augi 2011/2/15 Roland Gude roland.g...@yoochoose.commailto:roland.g...@yoochoose.com Yeah i know about that, but the definition i have is for a cluster that is started/stopped from a unit test with hector embeddedServerHelper, which takes definitions from the yaml. So i'd still like to define the index in the yaml file (it should very well be possible I guess) Von: Michal Augustýn [mailto:augustyn.mic...@gmail.commailto:augustyn.mic...@gmail.com] Gesendet: Dienstag, 15. Februar 2011 15:53 An: user@cassandra.apache.orgmailto:user@cassandra.apache.org Betreff: Re: cant seem to figure out secondary index definition Hi, if you download Cassandra and look into conf/cassandra.yaml then you can see this: this keyspace definition is for demonstration purposes only. Cassandra will not load these definitions during startup. See http://wiki.apache.org/cassandra/FAQ#no_keyspaces for an explanation. So you should make all schema-related operation via Thrift/AVRO API, or you can use Cassandra CLI. Augi 2011/2/15 Roland Gude roland.g...@yoochoose.commailto:roland.g...@yoochoose.com Hi, i am a little puzzled on creation of secondary indexes and the docs in that area are still very sparse. What I am trying to do is - in a columnfamily with TimeUUID comparator, I want the special timeuuid --1000-- to be indexed. The value being some UTF8 string on which I want to perform equality checks. What do I need to put in my cassandra.yaml file? Something like this? - column_metadata: [{name: --1000--, validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}] This gives me that error: 15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor - Fatal error: null; Can't construct a java object for tag:yaml.orghttp://yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=keyspaces for JavaBean=org.apache.cassandra.config.Config@7eb6e2; Cannot create property=column_families for JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create property=column_metadata for JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create property=validation_class for JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to find property 'validation_class' on class: org.apache.cassandra.config.RawColumnDefinition Bad configuration; unable to start server I am furthermor uncertain if the column name will be correctly used if given like this. Should I put the byte representation of the uuid there? Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.commailto:roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: Data ends up in wrong Columnfamily
Hi, machine A has absolutely no knowledge about the anything about the other application. Not even the columnfamily name. I was digging into this further: Since the data I find in the wrong space has a timestamp in its row key It was quite easy to find out that the data was relatively old. Unfortunately from a time where I do not have batch mutation logs from the server side. I think this might be related to the “deleted columns reappear” thread, as I saw the following happen: · I truncated the columnfamily that contained wrong data using the Cassandra-cli. · I regenerated the correct data for that columnfamily · I ran repair on a node in the cluster · - The data reappeared I tried this multiple times. And even tried to truncate the columnfamily using clustertool on the slight chance that it does something different than the cli when truncating. But up to the moment I have not been successful in removing the data from the cluster. Another strange thing about the issue is, that repair seems to blow up the data indefinetly. The columnfamily that contains wrong data contains around 200Kb of correct data before I repair. The complete cluster contains around 6Gb of data ( 3 nodes 3Gb each replication factor 2). After repair on one node, that node contains about 14GB of data. If I trigger a repair now on the second node, It gets to around 24Gb of data before it falls to OOM. Getting to 24Gb of data seems to be impossible to me from the amount of data I have written to the cluster. I can only imagine that it is data that was once deleted but keeps reappering and while doing so, it reappears in the wrong place. Note that the columnfamily that contains the wrong data did not even exist when the data was first written (It was created with the cli only a couple of days ago, while the oldest row I could find that was not supposed to exist was from January 7th) We did fail to run repair regulary on that cluster in the meantime. If I find a BatchMutation log that indicates an incorrect write received by the server, I will post it. Greetings, roland Von: Aaron Morton [mailto:aa...@thelastpickle.com] Gesendet: Donnerstag, 10. Februar 2011 21:37 An: user@cassandra.apache.org Betreff: Re: Data ends up in wrong Columnfamily Not heard of that before, chances are it's a problem in your code. Does machine A even know the other CF name? Can you log the batch mutations you are sending? When it appears in the other CF is the data complete? There is also a Hector list, perhaps they can help. Aaron On 10/02/2011, at 11:58 PM, Roland Gude roland.g...@yoochoose.commailto:roland.g...@yoochoose.com wrote: Hi, i am experiencing a strange issue. I have two applications writing to Cassandra (in different Column families in the same keyspace). The applications reside on different machines and know nothing about the existence of each other. The both produce data and write it in Cassandra with batch mutations using hector. So far so good, but it regularly happens, that data from one application ends up in columnfamilies reserved for the other application as well as the intended columnfamily. Machine A writes to column family CF_A Machine B writes to column families CF_B to CF_N Regularly data that was written (According to my application logs) from Machine A to CF_A ends up in CF_A and in one of the other columnfamilies. Any ideas why this could be happening? I am using Cassandra 0.7.0 and hector 0.7.0-23 Greetings, Roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.commailto:roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
AW: Why is it when I removed a row the RowKey is still there?
It has something to do with the way data is deleted in Cassandra. You are not doing anything wrong. See here http://wiki.apache.org/cassandra/FAQ#range_ghosts Or here: http://wiki.apache.org/cassandra/DistributedDeletes For some more detail -Ursprüngliche Nachricht- Von: Joshua Partogi [mailto:joshua.j...@gmail.com] Gesendet: Freitag, 11. Februar 2011 11:34 An: user@cassandra.apache.org Betreff: Why is it when I removed a row the RowKey is still there? Hi, I am very puzzled with this. So I removed a row from the client, but when I query the data from CLI, the rowkey is still there: RowKey: 3 --- RowKey: 2 = (column=6e616d65, value=42696c6c, timestamp=1297338131027004) --- RowKey: 1 = (column=6e616d65, value=4a6f65, timestamp=1297420269035522) Did I do something wrong? What do I need to do in order to completely remove the entire row with its key. Thank you for the assistance. Kind regards, Joshua -- http://twitter.com/jpartogi
AW: Data ends up in wrong Columnfamily
Yes this could very well be the issue. As I see its already fixed for 0.7.1. Hopefully it will pass a vote soon. Thanks, Roland -Ursprüngliche Nachricht- Von: sc...@scode.org [mailto:sc...@scode.org] Im Auftrag von Peter Schuller Gesendet: Freitag, 11. Februar 2011 09:11 An: user@cassandra.apache.org Betreff: Re: Data ends up in wrong Columnfamily So far so good, but it regularly happens, that data from one application ends up in columnfamilies reserved for the other application as well as the intended columnfamily. Maybe https://issues.apache.org/jira/browse/CASSANDRA-1992 -- / Peter Schuller
AW: cassandra solaris x64 support
This is a problem with the start scripts, not with Cassandra itself (or any of its configuration) The shell you are using cannot start the cassandra shell script. Try #bash bin/cassandra -f As far as I know, it should work fine. Actually it should work with sh as well... -Ursprüngliche Nachricht- Von: Xiaobo Gu [mailto:guxiaobo1...@gmail.com] Gesendet: Freitag, 11. Februar 2011 16:12 An: user@cassandra.apache.org Betreff: Re: cassandra solaris x64 support On Fri, Feb 11, 2011 at 10:51 PM, Jonathan Ellis jbel...@gmail.com wrote: The vast majority run on Linux, but there are a few people running Cassandra on Solaris, FreeBSD, and Windows. But I failed to start the one node test cluster, # sh bin/cassandra -f bin/cassandra: syntax error at line 22: `MAX_HEAP_SIZE=$' unexpected My environemnt is as follwoing: # more /etc/release Solaris 10 10/09 s10x_u8wos_08a X86 Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 16 September 2009 # java -fullversion java full version 1.6.0_23-b05 # java -version java version 1.6.0_23 Java(TM) SE Runtime Environment (build 1.6.0_23-b05) Java HotSpot(TM) Client VM (build 19.0-b09, mixed mode, sharing) I changed initial_token:0 On Fri, Feb 11, 2011 at 4:40 AM, Xiaobo Gu guxiaobo1...@gmail.com wrote: Hi, Because I can't access the archives of the mailing list, so my apologies if someone have asked this before. Does any have successfully run Cassandra on Solaris 10 X64 clusters? Regards, Xiaobo Gu -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Data ends up in wrong Columnfamily
Hi, i am experiencing a strange issue. I have two applications writing to Cassandra (in different Column families in the same keyspace). The applications reside on different machines and know nothing about the existence of each other. The both produce data and write it in Cassandra with batch mutations using hector. So far so good, but it regularly happens, that data from one application ends up in columnfamilies reserved for the other application as well as the intended columnfamily. Machine A writes to column family CF_A Machine B writes to column families CF_B to CF_N Regularly data that was written (According to my application logs) from Machine A to CF_A ends up in CF_A and in one of the other columnfamilies. Any ideas why this could be happening? I am using Cassandra 0.7.0 and hector 0.7.0-23 Greetings, Roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
strange issue with timeUUID columns
Hi, I am experiencing a strange issue when using TimeUUID as columnkeys. I am storing a number of events with timeUUId as key in a row. Later I try to query for a slice of that row with a given lower bound timeUUID and upperBoundTimeUUID (constructed as described in the wiki) If I inserted the events in ascending order everything goes well If for some reason I insert the events in random order (which may very well happen in a concurrent scenario) and I later query for the data (even with much more tolerant bounds) I get no data back. Furthermore if I wait for some time (about 15 minutes seem to be sufficient) I can query the data again. The Cassandra I use is a single node 0.7.0-rc2 I am querying with hector. Has anyone else experienced such issues? Can someone think of an explanation for this? Kind regards, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.comhttp://www.yoochoose.com/ YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln