mixed linux/windows cluster in Cassandra-1.2
Hello! is mixed linux/windows cluster configuration supported in 1.2 ? Cheers, Ilya Shipitsin
Re: Sorting keys for batch reads to minimize seeks
I am not sure what you are working on will have an effect. You can not actually control the way the operating system seeks data on disk. The io scheduling is done outside cassandra. You can try to write the code in an optimistic way taking phyical hardware into account, but then you have to consider there are n concurrent requests on the io system. On Friday, October 18, 2013, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Read latency depends on many factors, don't forget physics. If it meets your requirements, it is good. -Original Message- From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com] Sent: Friday, October 18, 2013 1:03 PM To: user@cassandra.apache.org Subject: Re: Sorting keys for batch reads to minimize seeks Hi, Thanks for your reply. Our latency currently is 23.618ms. However I simply read that off one node just now while it wasn't under a load test. I am going to be able to get a better number after the next test run. What is a good value for read latency? On 18/10/13 08:31, Viktor Jevdokimov wrote: The only thing you may win - avoid unnecessary network hops if: - request sorted keys (by token) from appropriate replica with ConsistencyLevel.ONE and dynamic_snitch: false. - nodes has the same load - replica not doing GC, and GC pauses are much higher than internode communication. For multiple keys request C* will do multiple single key reads, except for range scan requests, where only starting key and batch size is used in request. Consider multiple key request as a slow request by design, try to model your data for low latency single key requests. So, what latencies do you want to achieve? Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com] Sent: Thursday, October 17, 2013 7:40 PM To: user@cassandra.apache.org Subject: Sorting keys for batch reads to minimize seeks Hi, I am looking to somehow increase read performance on cassandra. We are still playing with configurations but I was thinking if there would be solutions in software that might help us speed up our read performance. E.g. one idea, not sure how sane that is, was to sort read-batches by row-keys before submitting them to cassandra. The idea is that row-keys should be closer together on the physical disk and therefor this may minimize the amount of random seeks we have to do when querying say 1000 entries from cassandra. Does that make any sense? Is there anything else that we can do in software to improve performance? Like specific batch sizes for reads? We are using the astyanax library to access cassandra. Thanks!
Re: Is read performance improved by moving more volatile data to different CF?
I would say no. If you design around row cache and your data acceas patterns change your assertions will be invalidates and your performance may be worst over time. I would use the kiss here. Keep it a smple usng one column family. Experiement with size teired vs leveled compaction. On Thursday, October 17, 2013, Jan Algermissen jan.algermis...@nordsc.com wrote: Hi, my rows consist of ~70 columns each, some containing small values, some containing larger amounts of content (think small documents). My data is occasionally updated and read several times per day as complete paging through all rows. The updates usually affect only about 10% of the small value columns. Speed of the full paging is of most interest to users. Given the very different volatility of the per-row data, do you think my read speed would dramatically improve by splitting the less frequently changes and the very frequently changed columns into two CFs? So I can enable the row cache for the seldom changing, larger sized portion of the data? Or would the effect likely by rather marginal? Jan
AUTO : Samuel CARRIERE is out of the office (retour 28/10/2013)
Je suis absent(e) du bureau jusqu'au 28/10/2013 Remarque : ceci est une réponse automatique à votre message Re: Is read performance improved by moving more volatile data to different CF? envoyé le 21/10/2013 18:03:29. C'est la seule notification que vous recevrez pendant l'absence de cette personne.
Re: Question about SizeTieredCompactionStrategy in C* 2.0: not all SSTables are being compacted
An easy way to test this would be to run stress or some other tool at a slow rate of inserts and watch the tables flush and compact naturally. On Tuesday, October 8, 2013, Sameer Farooqui sam...@blueplastic.com wrote: Hmm, good point. I'll test this out again and see the compaction behavior is as expected given the relative sizes of the SSTables. On Tue, Oct 8, 2013 at 3:06 PM, Tyler Hobbs ty...@datastax.com wrote: Well, 6 was created by the other sstables being compacted, correct? If so, they were probably quite a bit smaller (~25% of the size). Once you have two more sstables of roughly that size, they should be compacted automatically. On Tue, Oct 8, 2013 at 2:01 PM, Sameer Farooqui sam...@blueplastic.com wrote: Thanks for the reply, Tyler. I thought that too.. that maybe the SSTables are mismatched in size... but upon closer inspection, that doesn't appear to be the case: -rw-r--r-- 1 cassandra cassandra 227 Oct 7 23:26 demodb-users-jb-1-Data.db -rw-r--r-- 1 cassandra cassandra 242 Oct 8 00:38 demodb-users-jb-6-Data.db The two files look to be nearly the same size. There just appears to be something special about that first SSTable and it not getting compacted. On Tue, Oct 8, 2013 at 2:49 PM, Tyler Hobbs ty...@datastax.com wrote: SizeTieredCompactionStrategy only compacts sstables that are a similar size (by default, they basically need to be within 50% of each other). Perhaps your first SSTable was very large or small compared to the others? On Mon, Oct 7, 2013 at 8:06 PM, Sameer Farooqui sam...@blueplastic.com wrote: Hi, I have a fresh 1-node C* 2.0 install with a demo keyspace created with the SizeTiered compaction strategy. I've noticed that in the beginning this keyspace has just one SSTable: demodb-users-jb-1-Data.db But as I add more data to the table and do some flushes, the # of SSTables builds up. After I have a handful of SSTables, I trigger a flush using 'nodetool flush demodb users', but then not ALL of the SSTables get compacted. I've noticed that the 1st SSTable remains the same and doesn't disappear after the compaction, but the latter SSTables do get compacted into one new Data file. Is there a reason why the first SSTable is special and it is not disappearing after compaction? Also, I think I noticed that if I wait a few days and run another compaction, then that 1st SSTable does not compacted (and it disappears). Can someone help explain why the 1st SSTable behaves this way? -- Tyler Hobbs DataStax -- Tyler Hobbs DataStax
Re: Is read performance improved by moving more volatile data to different CF?
Stupid cell phone. I would say no. If you design around row cache and your data access patterns change, the original assertions may be invalidated and the performance might be worst then the simple design. On Mon, Oct 21, 2013 at 12:03 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I would say no. If you design around row cache and your data acceas patterns change your assertions will be invalidates and your performance may be worst over time. I would use the kiss here. Keep it a smple usng one column family. Experiement with size teired vs leveled compaction. On Thursday, October 17, 2013, Jan Algermissen jan.algermis...@nordsc.com wrote: Hi, my rows consist of ~70 columns each, some containing small values, some containing larger amounts of content (think small documents). My data is occasionally updated and read several times per day as complete paging through all rows. The updates usually affect only about 10% of the small value columns. Speed of the full paging is of most interest to users. Given the very different volatility of the per-row data, do you think my read speed would dramatically improve by splitting the less frequently changes and the very frequently changed columns into two CFs? So I can enable the row cache for the seldom changing, larger sized portion of the data? Or would the effect likely by rather marginal? Jan
Re: MemtablePostFlusher pending
On Mon, Oct 21, 2013 at 2:17 AM, Kais Ahmed k...@neteck-fr.com wrote: We have recently run in production a new cluster C* 2.0.0 with 3 nodes RF 3. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ What Version of Cassandra Should I Run in Production? If I were you I would probably try to read my data into a 1.2.x cluster. Downgrading versions on the same cluster is unlikely to work. You could try going to 2.0.1 but there is not compelling reason to believe this will fix your problem. =Rob
decommission of one EC2 node in cluster causes other nodes to go DOWN/UP and results in May not be enough replicas...
We have a 6 node cassandra 1.2.10 cluster running on aws with NetworkTopologyStrategy, a replication factor of 3 and the EC2Snitch. Each AWS availability zone has 2 nodes in it. When we are reading or writing data with consistency of Quorum to the cluster while decommissioning a node we are getting 'May not be enough replicas present to handle consistency level. This doesn't make sense because we are only taking one node down, we have an RF of three so even if we take one node down with a quorum read/write there should still be enough nodes with the data (2). Looking at the cassandra log on a server that we are not decommissioning we are seeing this during the decommission of the other node. INFO [GossipTasks:1] 2013-10-21 15:18:10,695 Gossiper.java (line 803) InetAddress /10.0.22.142 *is now DOWN* INFO [GossipTasks:1] 2013-10-21 15:18:10,696 Gossiper.java (line 803) InetAddress /10.0.32.159 *is now DOWN* INFO [HANDSHAKE-/10.0.22.142] 2013-10-21 15:18:10,862 OutboundTcpConnection.java (line 399) Handshaking version with /10.0.22.142 INFO [GossipTasks:1] 2013-10-21 15:18:11,696 Gossiper.java (line 803) InetAddress /10.0.12.178* is now DOWN* INFO [GossipTasks:1] 2013-10-21 15:18:11,697 Gossiper.java (line 803) InetAddress /10.0.22.106* is now DOWN* INFO [GossipTasks:1] 2013-10-21 15:18:11,698 Gossiper.java (line 803) InetAddress /10.0.32.248 *is now DOWN* Eventually we are seeing a message that looks like this. INFO [GossipStage:3] 2013-10-21 15:18:19,429 Gossiper.java (line 789) InetAddress /10.0.32.248 is now UP for each of the nodes. So eventually the remaining nodes in the cluster come back to life. While these nodes are down I can see why we get the May not be enough replicas... message. Because everything is down. My question is *why does gossip shutdown for these nodes that we aren't decommissioning in the first place*? -- John Pyeatt Singlewire Software, LLC www.singlewire.com -- 608.661.1184 john.pye...@singlewire.com
Re: mixed linux/windows cluster in Cassandra-1.2
I can't imagine any situation where this would be practical. What would be the reason to even consider this? On Oct 21, 2013, at 11:06 AM, Robert Coli rc...@eventbrite.com wrote: On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин chipits...@gmail.com wrote: is mixed linux/windows cluster configuration supported in 1.2 ? I don't think it's officially supported in any version; you would be among a very small number of people operating in this way. However there is no technical reason it shouldn't work. =Rob
Re: upgrading Cassandra server hardware best practice?
On Fri, Oct 18, 2013 at 3:27 PM, Arindam Barua aba...@247-inc.com wrote: Is step 1 just to reduce downtime for the node? Yes. Also, I’m assuming the initial_token of the new node should be set to be the same as the token of the old node, or close to that. Eg. [1] in “Replacing a Dead Node” talks about setting the new node’s intial_token to the value of the dead token – 1. (I’m not sure why the offset by 1 helps) Using a new initial_token is for the case where you cannot manually copy the data and replace the token owner in place by using auto_bootstrap:false. If the number of hosts with the new hardware (TBD) is different than the old, after doing what you suggested, I guess I can follow the regular steps for adding a new node/deleting a new node then. Yes, or use nodetool move followed by nodetool cleanup. =Rob
Fwd: {kundera-discuss} Kundera 2.8 released
fyi. -- Forwarded message -- From: Vivek Mishra vivek.mis...@impetus.co.in Date: Tue, Oct 22, 2013 at 1:33 AM Subject: {kundera-discuss} Kundera 2.8 released To: kundera-disc...@googlegroups.com kundera-disc...@googlegroups.com Hi All, We are happy to announce the release of Kundera 2.8 . Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL datastores. The idea behind Kundera is to make working with NoSQL databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB, Redis, OracleNoSQL, Neo4j,ElasticSearch,CouchDB and relational databases. Major Changes: == 1) Support for CouchDB as datastore. 2) Support for MappedSuperclass and JPA Inheritence strategy. Github Bug Fixes: === https://github.com/impetus-opensource/Kundera/pull/409 https://github.com/impetus-opensource/Kundera/issues/396 https://github.com/impetus-opensource/Kundera/issues/379 https://github.com/impetus-opensource/Kundera/issues/340 https://github.com/impetus-opensource/Kundera/issues/327 https://github.com/impetus-opensource/Kundera/issues/320 https://github.com/impetus-opensource/Kundera/issues/261 https://github.com/impetus-opensource/Kundera/pull/142 https://github.com/impetus-opensource/Kundera/issues/55 https://github.com/impetus-opensource/Kundera/issues/420 https://github.com/impetus-opensource/Kundera/issues/414 https://github.com/impetus-opensource/Kundera/issues/411 https://github.com/impetus-opensource/Kundera/issues/401 https://github.com/impetus-opensource/Kundera/issues/378 https://github.com/impetus-opensource/Kundera/issues/354 https://github.com/impetus-opensource/Kundera/issues/315 https://github.com/impetus-opensource/Kundera/issues/298 https://github.com/impetus-opensource/Kundera/issues/204 https://github.com/impetus-opensource/Kundera/issues/179 https://github.com/impetus-opensource/Kundera/issues/128 https://github.com/impetus-opensource/Kundera/issues/432 https://github.com/impetus-opensource/Kundera/issues/422 How to Download: To download, use or contribute to Kundera, visit: http://github.com/impetus-opensource/Kundera Latest released tag version is 2.8 Kundera maven libraries are now available at: https://oss.sonatype.org/content/repositories/releases/com/impetus Sample codes and examples for using Kundera can be found here: https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests Survey/Feedback: http://www.surveymonkey.com/s/BMB9PWG Thank you all for your contributions and using Kundera! Sincerely, Kundera Team NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- You received this message because you are subscribed to the Google Groups kundera-discuss group. To unsubscribe from this group and stop receiving emails from it, send an email to kundera-discuss+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: mixed linux/windows cluster in Cassandra-1.2
Technical reason is path separator, which is different on linux and windows. If you would search through maling list, you would have found evidence it does not work and it is not supported. But, the most recent notice I have found was about 0.7 and there was no jira bug number. Just unsupported. вторник, 22 октября 2013 г. пользователь Robert Coli писал: On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин chipits...@gmail.comjavascript:_e({}, 'cvml', 'chipits...@gmail.com'); wrote: is mixed linux/windows cluster configuration supported in 1.2 ? I don't think it's officially supported in any version; you would be among a very small number of people operating in this way. However there is no technical reason it shouldn't work. =Rob
Re: mixed linux/windows cluster in Cassandra-1.2
We want to migrate hundred gigabytes cluster from winows to linux without operation interruption. I.e. node by node. вторник, 22 октября 2013 г. пользователь Jon Haddad писал: I can't imagine any situation where this would be practical. What would be the reason to even consider this? On Oct 21, 2013, at 11:06 AM, Robert Coli rc...@eventbrite.comjavascript:_e({}, 'cvml', 'rc...@eventbrite.com'); wrote: On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин chipits...@gmail.comjavascript:_e({}, 'cvml', 'chipits...@gmail.com'); wrote: is mixed linux/windows cluster configuration supported in 1.2 ? I don't think it's officially supported in any version; you would be among a very small number of people operating in this way. However there is no technical reason it shouldn't work. =Rob
Re: mixed linux/windows cluster in Cassandra-1.2
We ran a Cassandra LAN party once with a mixed environment. http://www.datastax.com/dev/blog/cassandra-nyc-lan-party This was obviously a trivial setup. I think areas of concern would if you have column families located on different devices and streaming related issues. It might work just fine however, but I would test the migration first before just trying to run mixed mode. On Mon, Oct 21, 2013 at 4:15 PM, Илья Шипицин chipits...@gmail.com wrote: We want to migrate hundred gigabytes cluster from winows to linux without operation interruption. I.e. node by node. вторник, 22 октября 2013 г. пользователь Jon Haddad писал: I can't imagine any situation where this would be practical. What would be the reason to even consider this? On Oct 21, 2013, at 11:06 AM, Robert Coli rc...@eventbrite.com wrote: On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин chipits...@gmail.comwrote: is mixed linux/windows cluster configuration supported in 1.2 ? I don't think it's officially supported in any version; you would be among a very small number of people operating in this way. However there is no technical reason it shouldn't work. =Rob
Re: Huge multi-data center latencies
So it turned out the DataStax java client round-robins servers by default, which made periodic huge latencies. Switching to DCAwareRoundRobinPolicy solved the problem. Another question is how do you get the local DC name? The application can parse conf/cassandra-topology.properties manually, but since the server already knows which DC it belongs to, it would be nice that it can just specify local DC without giving actual names. Hobin On Sat, Oct 19, 2013 at 5:18 PM, Hobin Yoon hobiny...@gmail.com wrote: I am experiencing huge latencies with a multi-data center Cassandra cluster. With consistency level ONE, I expected almost the same latency with the single data center setup. What could possibly affect the latency in multi-data center setup? multi-DC setup min max avg (ms) 1 969 164.554264 single-DC setup min max avg (ms) 1 51 2.371786 I am using the Datastax Java client library ( https://github.com/datastax/java-driver). This is the keyspace description. ~/work/cassandra/bin$ ./cqlsh `hostname` Connected to cdbp at mdc-s70:9160. [cqlsh 3.1.7 | Cassandra 1.2.9-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.36.0] Use HELP for help. cqlsh desc keyspace pbdp; CREATE KEYSPACE pbdp WITH replication = { 'class': 'NetworkTopologyStrategy', 'DC2': '1', 'DC3': '1', 'DC0': '1', 'DC1': '1' }; USE pbdp; CREATE TABLE tweet ( tid bigint PRIMARY KEY, created_at_rt bigint, created_at_st text, lati float, longi float, real_coord boolean, sn text, text_ text ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; Thanks, Hobin
Re: Wide rows/composite keys clarification needed
So looking at Patrick McFadin's data modeling videos I now know about using compound keys as a way of partitioning data on a by-day basis. My other questions probably go more to the storage engine itself. How do you refer to the columns in the wide row? What kind of names are assigned to the columns? Les On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote: Please correct me if I'm not describing this correctly. But if I am collecting sensor data and have a table defined as follows: create table sensor_data ( sensor_id int, time_stamp int, // time to the hour granularity voltage float, amp float, PRIMARY KEY (sensor_id, time_stamp) )); The partitioning value is the sensor_id and the rest of the PK components become part of the column name for the additional fields, in this case voltage and amp. What goes into determining what additional data is inserted into this row? The first time an insert takes place there will be one entry for all of the fields. Is there anything besides the sensor_id that is used to determine that the subsequent insertions for that sensor will go into the same row as opposed to starting a new row? Base on something I read (but can't currently find again), I thought that as long as all of the elements of the PK remain the same (same sensor_id and still within the same hour as the first reading), that the next insertion would be tacked onto the end of the first row. Is this correct? For subsequent entries into the same row for additional voltage/amp readings, what are the names of the columns for these readings? My understanding is that the column name becomes a concatenation of the non-row key field names plus the data field names.So if the first go-around you have time_stamp:voltage and time_stamp:amp, what do the subsequent column names become? Thanks. Les
Re: Wide rows/composite keys clarification needed
If you're working with CQL, you don't need to worry about the column names, it's handled for you. If you specify multiple keys as part of the primary key, they become clustering keys and are mapped to the column names. So if you have a sensor_id / time_stamp, all your sensor readings will be in the same row in the traditional cassandra sense, sorted by your time_stamp. On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote: So looking at Patrick McFadin's data modeling videos I now know about using compound keys as a way of partitioning data on a by-day basis. My other questions probably go more to the storage engine itself. How do you refer to the columns in the wide row? What kind of names are assigned to the columns? Les On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote: Please correct me if I'm not describing this correctly. But if I am collecting sensor data and have a table defined as follows: create table sensor_data ( sensor_id int, time_stamp int, // time to the hour granularity voltage float, amp float, PRIMARY KEY (sensor_id, time_stamp) )); The partitioning value is the sensor_id and the rest of the PK components become part of the column name for the additional fields, in this case voltage and amp. What goes into determining what additional data is inserted into this row? The first time an insert takes place there will be one entry for all of the fields. Is there anything besides the sensor_id that is used to determine that the subsequent insertions for that sensor will go into the same row as opposed to starting a new row? Base on something I read (but can't currently find again), I thought that as long as all of the elements of the PK remain the same (same sensor_id and still within the same hour as the first reading), that the next insertion would be tacked onto the end of the first row. Is this correct? For subsequent entries into the same row for additional voltage/amp readings, what are the names of the columns for these readings? My understanding is that the column name becomes a concatenation of the non-row key field names plus the data field names.So if the first go-around you have time_stamp:voltage and time_stamp:amp, what do the subsequent column names become? Thanks. Les
Re: Wide rows/composite keys clarification needed
What if you plan on using Kundera and JPQL and not CQL? Les On Oct 21, 2013 4:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're working with CQL, you don't need to worry about the column names, it's handled for you. If you specify multiple keys as part of the primary key, they become clustering keys and are mapped to the column names. So if you have a sensor_id / time_stamp, all your sensor readings will be in the same row in the traditional cassandra sense, sorted by your time_stamp. On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote: So looking at Patrick McFadin's data modeling videos I now know about using compound keys as a way of partitioning data on a by-day basis. My other questions probably go more to the storage engine itself. How do you refer to the columns in the wide row? What kind of names are assigned to the columns? Les On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote: Please correct me if I'm not describing this correctly. But if I am collecting sensor data and have a table defined as follows: create table sensor_data ( sensor_id int, time_stamp int, // time to the hour granularity voltage float, amp float, PRIMARY KEY (sensor_id, time_stamp) )); The partitioning value is the sensor_id and the rest of the PK components become part of the column name for the additional fields, in this case voltage and amp. What goes into determining what additional data is inserted into this row? The first time an insert takes place there will be one entry for all of the fields. Is there anything besides the sensor_id that is used to determine that the subsequent insertions for that sensor will go into the same row as opposed to starting a new row? Base on something I read (but can't currently find again), I thought that as long as all of the elements of the PK remain the same (same sensor_id and still within the same hour as the first reading), that the next insertion would be tacked onto the end of the first row. Is this correct? For subsequent entries into the same row for additional voltage/amp readings, what are the names of the columns for these readings? My understanding is that the column name becomes a concatenation of the non-row key field names plus the data field names.So if the first go-around you have time_stamp:voltage and time_stamp:amp, what do the subsequent column names become? Thanks. Les
Re: Wide rows/composite keys clarification needed
So I just saw a post about how Kundera translates all JPQL to CQL. On Mon, Oct 21, 2013 at 4:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're working with CQL, you don't need to worry about the column names, it's handled for you. If you specify multiple keys as part of the primary key, they become clustering keys and are mapped to the column names. So if you have a sensor_id / time_stamp, all your sensor readings will be in the same row in the traditional cassandra sense, sorted by your time_stamp. On Oct 21, 2013, at 4:27 PM, Les Hartzman lhartz...@gmail.com wrote: So looking at Patrick McFadin's data modeling videos I now know about using compound keys as a way of partitioning data on a by-day basis. My other questions probably go more to the storage engine itself. How do you refer to the columns in the wide row? What kind of names are assigned to the columns? Les On Oct 20, 2013 9:34 PM, Les Hartzman lhartz...@gmail.com wrote: Please correct me if I'm not describing this correctly. But if I am collecting sensor data and have a table defined as follows: create table sensor_data ( sensor_id int, time_stamp int, // time to the hour granularity voltage float, amp float, PRIMARY KEY (sensor_id, time_stamp) )); The partitioning value is the sensor_id and the rest of the PK components become part of the column name for the additional fields, in this case voltage and amp. What goes into determining what additional data is inserted into this row? The first time an insert takes place there will be one entry for all of the fields. Is there anything besides the sensor_id that is used to determine that the subsequent insertions for that sensor will go into the same row as opposed to starting a new row? Base on something I read (but can't currently find again), I thought that as long as all of the elements of the PK remain the same (same sensor_id and still within the same hour as the first reading), that the next insertion would be tacked onto the end of the first row. Is this correct? For subsequent entries into the same row for additional voltage/amp readings, what are the names of the columns for these readings? My understanding is that the column name becomes a concatenation of the non-row key field names plus the data field names.So if the first go-around you have time_stamp:voltage and time_stamp:amp, what do the subsequent column names become? Thanks. Les
Opening multiple contexts...
Hello, I am new to the cassandra world and would like to know if it is possible to open multiple namespaces from a single program. I am using the libQtCassandra library. Is it possible to open multiple namespaces and multiple tables and store different data into different contexts and tables within the same program. I tried creating two QCassandra objects and tried opening two contexts but its resulting in a core dump. Please help. Thanks in advance!!! -- Regards, BNSK*. *
Unable to start dse 3.1.4 server on Mac as a process
New to Cassandra and struggling to get DSE server started. Any help is appreciated! Thanks so much! ... INFO 21:04:08,568 Initializing system.Schema INFO 21:04:08,576 Initializing system.schema_keyspaces INFO 21:04:08,582 Initializing system.range_xfers INFO 21:04:08,587 Initializing system.HintsColumnFamily INFO 21:04:08,591 Initializing system.schema_columnfamilies INFO 21:04:08,597 Initializing system.NodeIdInfo INFO 21:04:08,602 Initializing system.schema_columns INFO 21:04:08,607 Initializing system.IndexInfo INFO 21:04:08,612 Initializing system.Migrations INFO 21:04:08,616 Initializing system.peers ERROR 21:04:08,619 Exception encountered during startup java.lang.RuntimeException: Can't open incompatible SSTable! Current version ic, found file: /var/lib/cassandra/data/system/local/system-local-jb-5 at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:376) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:353) at org.apache.cassandra.db.Table.initCf(Table.java:329) at org.apache.cassandra.db.Table.init(Table.java:272) at org.apache.cassandra.db.Table.open(Table.java:109) at org.apache.cassandra.db.Table.open(Table.java:87) at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:478) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:242) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:137) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:446) at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:334) java.lang.RuntimeException: Can't open incompatible SSTable! Current version ic, found file: /var/lib/cassandra/data/system/local/system-local-jb-5 at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:376) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:353) at org.apache.cassandra.db.Table.initCf(Table.java:329) at org.apache.cassandra.db.Table.init(Table.java:272) at org.apache.cassandra.db.Table.open(Table.java:109) at org.apache.cassandra.db.Table.open(Table.java:87) at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:478) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:242) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:137) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:446) at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:334) Exception encountered during startup: Can't open incompatible SSTable! Current version ic, found file: /var/lib/cassandra/data/system/local/system-local-jb-5 - Rich