Re: Materialized views and composite partition keys
Hello, I've just changed my materialized view to have one partition key. The view gets generated now. After some refactoring I found that I didn't need a composite primary key at all. However if I later need one then I'll use a UDT. If it works... On Wed, 10 Feb 2016 at 13:04 DuyHai Doan <doanduy...@gmail.com> wrote: > You can't have more than 1 non-pk column from the base table as primary > key column of the view. All is explained here: > http://www.doanduyhai.com/blog/?p=1930 > > On Wed, Feb 10, 2016 at 10:43 AM, Abdul Jabbar Azam <aja...@gmail.com> > wrote: > >> Hello, >> >> I tried creating a material view using a composite partition key but I >> got an error. I can't remember the error but it was complaining about the >> presence of the second field in the partition key. >> >> Has anybody experienced this or have a workaround. I haven't tried UDT's >> yet. >> >> >> -- >> Regards >> >> Abdul Jabbar Azam >> twitter: @ajazam >> > > -- Regards Abdul Jabbar Azam twitter: @ajazam
Materialized views and composite partition keys
Hello, I tried creating a material view using a composite partition key but I got an error. I can't remember the error but it was complaining about the presence of the second field in the partition key. Has anybody experienced this or have a workaround. I haven't tried UDT's yet. -- Regards Abdul Jabbar Azam twitter: @ajazam
Re: Materialized views and composite partition keys
Ah. I think that's where I'm going wrong. I'll have a look when I get home. On Wed, 10 Feb 2016 at 13:04 DuyHai Doan <doanduy...@gmail.com> wrote: > You can't have more than 1 non-pk column from the base table as primary > key column of the view. All is explained here: > http://www.doanduyhai.com/blog/?p=1930 > > On Wed, Feb 10, 2016 at 10:43 AM, Abdul Jabbar Azam <aja...@gmail.com> > wrote: > >> Hello, >> >> I tried creating a material view using a composite partition key but I >> got an error. I can't remember the error but it was complaining about the >> presence of the second field in the partition key. >> >> Has anybody experienced this or have a workaround. I haven't tried UDT's >> yet. >> >> >> -- >> Regards >> >> Abdul Jabbar Azam >> twitter: @ajazam >> > > -- Regards Abdul Jabbar Azam twitter: @ajazam
cassandra client testing
Hello, What do people do to test their cassandra client code? Do you a) mock out the cassandra code b) use a framework which simulates cassandra c) actually use cassandra, perhaps inside docker -- Regards Abdul Jabbar Azam twitter: @ajazam
Re: cassandra client testing
This looks really good. I can see in master that java driver 3.0 support has been added. I can't see out how to generate exceptions. I'd like to test my akka supervisor hierarchy as well. On Tue, 9 Feb 2016 at 22:48 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > http://www.scassandra.org/ > > From: Abdul Jabbar Azam > Reply-To: "user@cassandra.apache.org" > Date: Tuesday, February 9, 2016 at 2:23 PM > To: "user@cassandra.apache.org" > Subject: cassandra client testing > > Hello, > > What do people do to test their cassandra client code? Do you > > a) mock out the cassandra code > b) use a framework which simulates cassandra > c) actually use cassandra, perhaps inside docker > > > -- > Regards > > Abdul Jabbar Azam > twitter: @ajazam > -- Regards Abdul Jabbar Azam twitter: @ajazam
Re: cassandra client testing
Hello Will, I'll give scassandra a try first, otherwise use a test keyspace. On Tue, 9 Feb 2016 at 22:52 Will Hayworth <whaywo...@atlassian.com> wrote: > I've never seen Scassandra before--neat! > > For what it's worth, we just use a test keyspace with a lower RF (that is > to say, 1). The tables are identical to our prod keyspace, but the > permissions are different for the user on our Bamboo instances so that we > can test things like table creation etc. > > ___ > Will Hayworth > Developer, Engagement Engine > Atlassian > > My pronoun is "they". <http://pronoun.is/they> > > > > On Tue, Feb 9, 2016 at 2:47 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> > wrote: > >> http://www.scassandra.org/ >> >> From: Abdul Jabbar Azam >> Reply-To: "user@cassandra.apache.org" >> Date: Tuesday, February 9, 2016 at 2:23 PM >> To: "user@cassandra.apache.org" >> Subject: cassandra client testing >> >> Hello, >> >> What do people do to test their cassandra client code? Do you >> >> a) mock out the cassandra code >> b) use a framework which simulates cassandra >> c) actually use cassandra, perhaps inside docker >> >> >> -- >> Regards >> >> Abdul Jabbar Azam >> twitter: @ajazam >> > > -- Regards Abdul Jabbar Azam twitter: @ajazam
Re: What are the best ways to learn Apache Cassandra
The documentation at www.datastax.com is very good. So I would recommend that. On Sat, 19 Dec 2015, 09:21 Akhil Mehra <akhilme...@gmail.com> wrote: > What are some things you wish you knew when you started learning Apache > Cassandra. > > What are some of the best resources you have come across to learn Apache > Cassandra. Books, blogs etc. I am looking for tips on key concepts, > principles that you wish you were exposed to when you started learning > Apache Cassandra. > > What were the main pain points when trying to get to grips with Cassandra. > > Essentially I am looking for all tips that will help shorten the learning > curve. > > Thanks > Regards, > Akhil Mehra > -- Regards Abdul Jabbar Azam twitter: @ajazam
Re: Using Cassandra for geospacial search
Hello, You'll find this useful http://www.slideshare.net/mobile/mmalone/working-with-dimensional-data-in-distributed-hash-tables Its how simplegeo used geohashing and Cassandra for geolocation. On Mon, 26 Jan 2015 15:48 SEGALIS Morgan msega...@gmail.com wrote: Hi everyone, I wanted to know if someone has a feedback using geoHash algorithme with cassandra ? I will have to create a nearby functionnality soon, and I really would like to do it with cassandra for it's scalability, otherwise the smart choice would be MongoDB apparently. Is Cassandra can be used to do geospacial search (with some kind of radius) while being fast and scalable ? Thanks. -- Morgan SEGALIS
Re: Best approach in Cassandra (+ Spark?) for Continuous Queries?
Hello, Or you can have a look at akka http://www.akka.io for event processing and use cassandra for persistence(Peters suggestion). On Sat Jan 03 2015 at 11:59:45 AM Peter Lin wool...@gmail.com wrote: It looks like you're using the wrong tool and architecture. If the use case really needs continuous query like event processing, use an ESP product to do that. You can still store data in Cassandra for persistence . The design you want is to have two paths: event stream and persistence. At the entry point, the system makes parallel calls. One goes to a messaging system that feeds the ESP and a second that calls Cassandra Sent from my iPhone On Jan 3, 2015, at 5:46 AM, Hugo José Pinto hugo.pi...@inovaworks.com wrote: Hello. We're currently using Hazelcast (http://hazelcast.org/) as a distributed in-memory data grid. That's been working sort-of-well for us, but going solely in-memory has exhausted its path in our use case, and we're considering porting our application to a NoSQL persistent store. After the usual comparisons and evaluations, we're borderline close to picking Cassandra, plus eventually Spark for analytics. Nonetheless, there is a gap in our architectural needs that we're still not grasping how to solve in Cassandra (with or without Spark): Hazelcast allows us to create a Continuous Query in that, whenever a row is added/removed/modified from the clause's resultset, Hazelcast calls up back with the corresponding notification. We use this to continuously update the clients via AJAX streaming with the new/changed rows. This is probably a conceptual mismatch we're making, so - how to best address this use case in Cassandra (with or without Spark's help)? Is there something in the API that allows for Continuous Queries on key/clause changes (haven't found it)? Is there some other way to get a stream of key/clause updates? Events of some sort? I'm aware that we could, eventually, periodically poll Cassandra, but in our use case, the client is potentially interested in a large number of table clause notifications (think all changes to Ship positions on California's coastline), and iterating out of the store would kill the streamer's scalability. Hence, the magic question: what are we missing? Is Cassandra the wrong tool for the job? Are we not aware of a particular part of the API or external library in/outside the apache realm that would allow for this? Many thanks for any assistance! Hugo
Re: Wide rows best practices and GC impact
Hello, I saw this earlier yesterday but didn't want to reply because I didn't know what the cause was. Basically I using wide rows with cassandra 1.x and was inserting data constantly. After about 18 hours the JVM would crash with a dump file. For some reason I removed the compaction throttling and the problem disappeared. I've never really found out what the root cause was. On Thu Dec 04 2014 at 2:49:57 AM Gianluca Borello gianl...@draios.com wrote: Thanks Robert, I really appreciate your help! I'm still unsure why Cassandra 2.1 seem to perform much better in that same scenario (even setting the same values of compaction threshold and number of compactors), but I guess we'll revise when we'll decide to upgrade 2.1 in production. On Dec 3, 2014 6:33 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello gianl...@draios.com wrote: We mainly store time series-like data, where each data point is a binary blob of 5-20KB. We use wide rows, and try to put in the same row all the data that we usually need in a single query (but not more than that). As a result, our application logic is very simple (since we have to do just one query to read the data on average) and read/write response times are very satisfactory. This is a cfhistograms and a cfstats of our heaviest CF: 100mb is not HYOOOGE but is around the size where large rows can cause heap pressure. You seem to be unclear on the implications of pending compactions, however. Briefly, pending compactions indicate that you have more SSTables than you should. As compaction both merges row versions and reduces the number of SSTables, a high number of pending compactions causes problems associated with both having too many row versions (fragmentation) and a large number of SSTables (per-SSTable heap/memory (depending on version) overhead like bloom filters and index samples). In your case, it seems the problem is probably just the compaction throttle being too low. My conjecture is that, given your normal data size and read/write workload, you are relatively close to GC pre-fail when compaction is working. When it stops working, you relatively quickly get into a state where you exhaust heap because you have too many SSTables. =Rob http://twitter.com/rcolidba PS - Given 30GB of RAM on the machine, you could consider investigating large-heap configurations, rbranson from Instagram has some slides out there on the topic. What you pay is longer stop the world GCs, IOW latency if you happen to be talking to a replica node when it pauses.
Re: Storing time-series and geospatial data in C*
Spico, Here's a link flor the time series data http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra You'll also need to understand the composite key format http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refCompositePk.html Mike Malone has done videos and slides on how they used an older version of cassandra for storing geo information http://readwrite.com/2011/02/17/video-simplegeo-cassandra Or you can use elastic search for for working with geospatial information. http://blog.florian-hopf.de/2014/08/use-cases-for-elasticsearch-geospatial.html A word of warning though with elastic search. It does not provide simple linear scalability like cassandra, nor is it easy to setup for cross datacentre operation. Datastax enterprise has Solr integrated so you could use that http://digbigdata.com/geospatial-search-cassandra-datastax-enterprise/ Jabbar Azam On Thu Nov 27 2014 at 12:39:59 PM Spico Florin spicoflo...@gmail.com wrote: Hello! Can you please recommend me some new articles and case studies were Cassandra was used to store time-series and geo-spatial data? I'm particular interested in best practices, data models and retrieval techniques. Thanks. Regards, Florin
Re: Redundancy inside a cassandra node
Hello Alexey, The node count is 20 per site and there will be two sites. RF=3. But since the software isn't complete and the database code is going through a rewrite we aren't sure about space requirements. The node count is only a guess, bases on the number of dev nodes in use. We will have better information when the rewrite is done and testing resumes. The data will be time series data. It was binary blobs originally but we have found that the new datastax c# drivers have improved alot in terms of read performance. I'm curious. What is your definition of commodity. My IT people seem to think that the servers must be super robust. Personally I'm not sure if that should be the case. The node Thanks Jabbar Azam On 8 November 2014 02:56, Plotnik, Alexey aplot...@rhonda.ru wrote: Cassandra is a cluster itself, it's not necessary to have redundant each node. Cassandra has replication for that. And also Cassandra is designed to run in multiple data center - am think that redundant policy is applicable for you. Only thing from your saying you can deploy is raid10, other don't make any sense. As you are in stage of designing you cluster, please provide some numbers: how many data will be stored on each node, how many nodes would you have? What type of data will be stored in cluster: binary object o something time series? Cassandra is designed to run on commodity hardware. Отправлено с iPad 8 нояб. 2014 г., в 6:26, Jabbar Azam aja...@gmail.com написал(а): Hello all, My work will be deploying a cassandra cluster next year. Due to internal wrangling we can't seem to agree on the hardware. The software hasn't been finished, but management are asking for a ballpark figure for the hardware costs. The problem is the IT team are saying the nodes need to have multiple points of redundancy e.g. dual power supplies, dual nics, SSD's configured in raid 10. The software team is saying that due to cassandras resilient nature, due to the way data is distributed and scalability that lots of cheap boes should be used. So they have been taling about self build consumer grade boxes with single nics, PSU's single SSDs etc. Obviously the self build boxes will cost a fraction of the price, but each box is not as resilient as the first option. We don;t use any cloud technologies, so that's out of the question. My question is what do people use in the real world in terms of node resiliancy when running a cassandra cluster? Write now the team is only thinking of hosting cassandra on the nodes. I'll see if I can twist their arms and see the light with Apache Spark. Obviously there are other tiers of servers, but they won't be running cassandra. Thanks Jabbar Azam
Re: Redundancy inside a cassandra node
Hello Jack, Some really good points. I never thought of issues with the JVM or OOM issues. Thanks Jabbar Azam On 8 November 2014 16:52, Jack Krupansky j...@basetechnology.com wrote: About the only thing you can say is two specific points: 1. A more resilient node is great, but it in no ways reduces or eliminates the need total nodes. Sometimes nodes become inaccessible due to network outages or system maintenance (e.g., software upgrades), or the vagaries of Java JVM and OOM issues. 2. Replication redundancy is also for supporting higher load, not just availability on node outage. -- Jack Krupansky *From:* Jabbar Azam aja...@gmail.com *Sent:* Friday, November 7, 2014 3:24 PM *To:* user@cassandra.apache.org *Subject:* Redundancy inside a cassandra node Hello all, My work will be deploying a cassandra cluster next year. Due to internal wrangling we can't seem to agree on the hardware. The software hasn't been finished, but management are asking for a ballpark figure for the hardware costs. The problem is the IT team are saying the nodes need to have multiple points of redundancy e.g. dual power supplies, dual nics, SSD's configured in raid 10. The software team is saying that due to cassandras resilient nature, due to the way data is distributed and scalability that lots of cheap boes should be used. So they have been taling about self build consumer grade boxes with single nics, PSU's single SSDs etc. Obviously the self build boxes will cost a fraction of the price, but each box is not as resilient as the first option. We don;t use any cloud technologies, so that's out of the question. My question is what do people use in the real world in terms of node resiliancy when running a cassandra cluster? Write now the team is only thinking of hosting cassandra on the nodes. I'll see if I can twist their arms and see the light with Apache Spark. Obviously there are other tiers of servers, but they won't be running cassandra. Thanks Jabbar Azam
Re: Re[2]: Redundancy inside a cassandra node
With regards to money I think it's always a good idea to find a cost effective solution. The problem is different people have different interpretations of what cost effectiveness means. I'm referring to my organisation. ;). I'm sure it happens in other organisations. Biases, politics, experience, how stuff is currently done dictates how new solutions are created. I think the idea of not using redundancy, goes against current thinking unfortunately. Especially not using raid 10. I think the problem may be due to lack of know how of dev ops and tools like cobbler and ansible, chef and puppet.. I'm working on this, but it's hard work doing this in my spare time. Do you build your own nodes, or use a well known brand like Dell or HP. Dell recommended R720 nodes for the cassandra nodes or the R320 nodes. We have built our own dev nodes from consumer grade kit but becuase they have no redundancy they are not taken seriously for production nodes. They're not rack mount, which is a big no with respect to the IT department. Thanks Jabbar Azam On 8 November 2014 12:31, Plotnik, Alexey aplot...@rhonda.ru wrote: Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The problem is money. If your IT people have a $$$ they can deploy Cassandra on super robust hardware with triple power supply of course. But why then you need Cassandra? Only for scalability? The idea of high available clusters is to get robustness from availability (not from hardware reliability). More availability (more nodes) you have - more money you need to buy hardware. Cassandra is the most high available system on the planet - it scaled horizontally to any number of nodes. You have time series data, you can set replication factor 3 if needed. There is a concept of network topology in Cassandra - you can specify on which *failure domain* (racks or independent power lines) your nodes installed on, and then replication will be computed correspondingly to store replicas of a specified data on a different failure domains. The same is for DC - there is a concept of data center in Cassandra topology, it knows about your data centers. You should think not about hardware but about your data model - is Cassandra applicable for you domain? Thinks about queries to your data. Cassandra is actually a key value storage (documentation says it's a column based storage, but it's just an CQL-abstraction over key and binary value, nothing special except counters) so be very careful in designing your data model. Anyway, let me answer your original question: what do people use in the real world in terms of node resiliancy when running a cassandra cluster? Nothing because Cassandra is high available system. They use SSDs if they need speed. They do not use Raid10 on the node, they don't use dual power as well, because it's not cheap in cluster of many nodes and have no sense because reliability is ensured by replication in large clusters. Not sure about dual NICs, network reliability is ensured by distributing your cluster across multiple data centers. We're using single SSD and single HDD on each node (we symlink some CF folders to other disk). SSD for CFs where we need low latency, HDD for binary data. If one of them fails, replication save us and we have time to deploy new node and load data from replicas with Cassandra repair feature back to original node. And we have no problem with it, node fail sometimes, but it doesn't affect customers. That is. -- Original Message -- From: Jabbar Azam aja...@gmail.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: 08.11.2014 19:43:18 Subject: Re: Redundancy inside a cassandra node Hello Alexey, The node count is 20 per site and there will be two sites. RF=3. But since the software isn't complete and the database code is going through a rewrite we aren't sure about space requirements. The node count is only a guess, bases on the number of dev nodes in use. We will have better information when the rewrite is done and testing resumes. The data will be time series data. It was binary blobs originally but we have found that the new datastax c# drivers have improved alot in terms of read performance. I'm curious. What is your definition of commodity. My IT people seem to think that the servers must be super robust. Personally I'm not sure if that should be the case. The node Thanks Jabbar Azam On 8 November 2014 02:56, Plotnik, Alexey aplot...@rhonda.ru wrote: Cassandra is a cluster itself, it's not necessary to have redundant each node. Cassandra has replication for that. And also Cassandra is designed to run in multiple data center - am think that redundant policy is applicable for you. Only thing from your saying you can deploy is raid10, other don't make any sense. As you are in stage of designing you cluster, please provide some numbers: how many data will be stored on each node, how many nodes
Re: Re[2]: Redundancy inside a cassandra node
Hello Eric, You make a good point about resiliency being applied at a higher level in the stack. Thanks Jabbar Azam On 8 November 2014 14:24, Eric Stevens migh...@gmail.com wrote: They do not use Raid10 on the node, they don't use dual power as well, because it's not cheap in cluster of many nodes I think the point here is that money spent on traditional failure avoidance models is better spent in a Cassandra cluster by instead having more nodes of less expensive hardware. Rather than redundant disks network ports and power supplies, spend that money on another set of nodes in a different topological (and probably physical) rack. The parallel to having redundant disk arrays is to increase replication factor (RF=3 is already one replica better than Raid 10, and with fewer SPOFs). The only reason I can think you'd want to double down on hardware failover like the traditional model is if you are constrained in your data center (eg, space or cooling) and you'd rather run machines which are individually physically more resilient in exchange for running a lower RF. On Sat Nov 08 2014 at 5:32:22 AM Plotnik, Alexey aplot...@rhonda.ru wrote: Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The problem is money. If your IT people have a $$$ they can deploy Cassandra on super robust hardware with triple power supply of course. But why then you need Cassandra? Only for scalability? The idea of high available clusters is to get robustness from availability (not from hardware reliability). More availability (more nodes) you have - more money you need to buy hardware. Cassandra is the most high available system on the planet - it scaled horizontally to any number of nodes. You have time series data, you can set replication factor 3 if needed. There is a concept of network topology in Cassandra - you can specify on which *failure domain* (racks or independent power lines) your nodes installed on, and then replication will be computed correspondingly to store replicas of a specified data on a different failure domains. The same is for DC - there is a concept of data center in Cassandra topology, it knows about your data centers. You should think not about hardware but about your data model - is Cassandra applicable for you domain? Thinks about queries to your data. Cassandra is actually a key value storage (documentation says it's a column based storage, but it's just an CQL-abstraction over key and binary value, nothing special except counters) so be very careful in designing your data model. Anyway, let me answer your original question: what do people use in the real world in terms of node resiliancy when running a cassandra cluster? Nothing because Cassandra is high available system. They use SSDs if they need speed. They do not use Raid10 on the node, they don't use dual power as well, because it's not cheap in cluster of many nodes and have no sense because reliability is ensured by replication in large clusters. Not sure about dual NICs, network reliability is ensured by distributing your cluster across multiple data centers. We're using single SSD and single HDD on each node (we symlink some CF folders to other disk). SSD for CFs where we need low latency, HDD for binary data. If one of them fails, replication save us and we have time to deploy new node and load data from replicas with Cassandra repair feature back to original node. And we have no problem with it, node fail sometimes, but it doesn't affect customers. That is. -- Original Message -- From: Jabbar Azam aja...@gmail.com To: user@cassandra.apache.org user@cassandra.apache.org Sent: 08.11.2014 19:43:18 Subject: Re: Redundancy inside a cassandra node Hello Alexey, The node count is 20 per site and there will be two sites. RF=3. But since the software isn't complete and the database code is going through a rewrite we aren't sure about space requirements. The node count is only a guess, bases on the number of dev nodes in use. We will have better information when the rewrite is done and testing resumes. The data will be time series data. It was binary blobs originally but we have found that the new datastax c# drivers have improved alot in terms of read performance. I'm curious. What is your definition of commodity. My IT people seem to think that the servers must be super robust. Personally I'm not sure if that should be the case. The node Thanks Jabbar Azam On 8 November 2014 02:56, Plotnik, Alexey aplot...@rhonda.ru wrote: Cassandra is a cluster itself, it's not necessary to have redundant each node. Cassandra has replication for that. And also Cassandra is designed to run in multiple data center - am think that redundant policy is applicable for you. Only thing from your saying you can deploy is raid10, other don't make any sense. As you are in stage of designing you cluster, please provide some numbers: how many
Redundancy inside a cassandra node
Hello all, My work will be deploying a cassandra cluster next year. Due to internal wrangling we can't seem to agree on the hardware. The software hasn't been finished, but management are asking for a ballpark figure for the hardware costs. The problem is the IT team are saying the nodes need to have multiple points of redundancy e.g. dual power supplies, dual nics, SSD's configured in raid 10. The software team is saying that due to cassandras resilient nature, due to the way data is distributed and scalability that lots of cheap boes should be used. So they have been taling about self build consumer grade boxes with single nics, PSU's single SSDs etc. Obviously the self build boxes will cost a fraction of the price, but each box is not as resilient as the first option. We don;t use any cloud technologies, so that's out of the question. My question is what do people use in the real world in terms of node resiliancy when running a cassandra cluster? Write now the team is only thinking of hosting cassandra on the nodes. I'll see if I can twist their arms and see the light with Apache Spark. Obviously there are other tiers of servers, but they won't be running cassandra. Thanks Jabbar Azam
Re: Scala driver
Hello, I'm also using the Java driver. Its evolving the fastest and is simple to use Thanks Jabbar Azam On 2 Sep 2014 06:15, Gary Zhao garyz...@gmail.com wrote: Thanks Jan. I decided to use Java driver directly. It's not hard to use. On Sun, Aug 31, 2014 at 1:08 AM, Jan Algermissen jan.algermis...@nordsc.com wrote: Hi Gary, On 31 Aug 2014, at 07:19, Gary Zhao garyz...@gmail.com wrote: Hi Could you recommend a Scala driver and share your experiences of using it. Im thinking if i use java driver in Scala directly I am using Martin’s approach without any problems: https://github.com/magro/play2-scala-cassandra-sample The actual mapping from Java to Scala futures for the async case is in https://github.com/magro/play2-scala-cassandra-sample/blob/master/app/models/Utils.scala HTH, Jan Thanks
Re: Backup Cassandra to
Yes, I never thought of that. Thanks Jabbar Azam On 12 June 2014 19:45, Jeremy Jongsma jer...@barchart.com wrote: That will not necessarily scale, and I wouldn't recommend it - your backup node will need as much disk space as an entire replica of the cluster data. For a cluster with a couple of nodes that may be OK, for dozens of nodes, probably not. You also lose the ability to restore individual nodes - the only way to replace a dead node is with a full repair. On Thu, Jun 12, 2014 at 1:38 PM, Jabbar Azam aja...@gmail.com wrote: There is another way. You create a cassandra node in it's own datacentre, then any changes going to the main cluster will be replicated to this node. You can backup from this node. In the event of a disaster the data from both clusters and wiped and then replayed to the individual node. The data will then be replicated to the main cluster. This will also work for the case when the main cluster increases or decreases in size. Thanks Jabbar Azam On 12 June 2014 18:27, Andrew redmu...@gmail.com wrote: There isn’t a lot of “actual documentation” on the act of backing up, but I did research for my own company into the act of backing up and unfortunately, you’re not going to have a similar setup as Oracle. There are reasons for this, however. If you have more than one replica of the data, that means each node in the cluster will likely be holding it’s own unique set of data. So you would need to back up the ENTIRE set of nodes in order to get an accurate snapshot. Likewise, you would need to restore it to the cluster of the same size in order to restore it (and then run refresh to tell Cassandra to reload the tables from disk). Copying the snapshots is easy—it’s just a bunch of files in your data directory. It’s even smaller if you use incremental snapshots. I’ll admit, I’m no expert on tape drives, but I’d imagine it’s as easy as copy/pasting the snapshots to the drive (or whatever the equivalent tape drive operation is). What you (and I, admittedly) would really like to see is a way to back up all the logical *data*, and then simply replay it. This is possible on Oracle because it’s typically restricted to either one (plus maybe one or two standbys) that don’t “share” any data. What you could do, in theory, is literally select all the data in the entire cluster and simply dump it to a file—but this could take hours, days, or even weeks to complete, depending on the size of your data, and then simply re-load it. This is probably not a great solution, but hey—maybe it will work for you. Netflix (thankfully) has posted a lot of their operational observations and what not, including their utility Priam. In their documentation, they include some overviews of what they use: https://github.com/Netflix/Priam/wiki/Backups Hope this helps! Andrew On June 12, 2014 at 6:18:57 AM, Jack Krupansky (j...@basetechnology.com) wrote: The doc for backing up – and restoring – Cassandra is here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_backup_restore_c.html That doesn’t tell you how to move the “snapshot” to or from tape, but a snapshot is the starting point for backing up Cassandra. -- Jack Krupansky *From:* Camacho, Maria (NSN - FI/Espoo) maria.cama...@nsn.com *Sent:* Thursday, June 12, 2014 4:57 AM *To:* user@cassandra.apache.org *Subject:* Backup Cassandra to Hi there, I'm trying to find information/instructions about backing up and restoring a Cassandra DB to and from a tape unit. I was hopping someone in this forum could help me with this since I could not find anything useful in Google :( Thanks in advance, Maria
Re: CQL query regarding indexes
In this use case you don't need the secondary index. Instead use Primary key(partition_id, senttime) Thanks Jabbar Azam On 12 Jun 2014 23:44, Roshan codeva...@gmail.com wrote: Hi Cassandra - 2.0.8 DataStax driver - 2.0.2 I have create a keyspace and a table with indexes like below. CREATE TABLE services.messagepayload ( partition_id uuid, messageid bigint, senttime timestamp, PRIMARY KEY (partition_id) ) WITH compression = { 'sstable_compression' : 'LZ4Compressor', 'chunk_length_kb' : 64 }; CREATE INDEX idx_messagepayload_senttime ON services.messagepayload (senttime); While I am running the below query I am getting an exception. SELECT * FROM b_bank_services.messagepayload WHERE senttime=140154480 AND senttime=140171760 ALLOW FILTERING; com.datastax.driver.core.exceptions.InvalidQueryException: No indexed columns present in by-columns clause with Equal operator Could someone can explain what's going on? I have create a index to the search column, but seems not working. Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-query-regarding-indexes-tp7595122.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: autoscaling cassandra cluster
Netflix uses Scryer http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.htmlfor predictive and reactive autoscaling but they only refer to EC2 instances. They don't mention anything about cassandra scaling or adding and removing nodes. I've just looked at the priam wiki and it also doesn't mention scaling. It also mentions that vnodes aren't fully supported. That's no use for me as I'm using 2.x. The other issue, rather feature of cassandra, is that adding a new node increases the load on the system so this surge would need to be taken into account. I think I'll leave this problem for more intelligent people than me and concentrate on the application logic, which can scale by adding or removing application and front end servers. Thanks for all your comments. Thanks Jabbar Azam On 22 May 2014 19:55, Robert Coli rc...@eventbrite.com wrote: On Wed, May 21, 2014 at 4:35 AM, Jabbar Azam aja...@gmail.com wrote: Has anybody got a cassandra cluster which autoscales depending on load or times of the day? Netflix probably does, managed with Priam. In general I personally do not consider Cassandra's mechanisms for joining and parting nodes to currently work well enough to consider designing a production system which would do so as part of regular operation. =Rob
autoscaling cassandra cluster
Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: autoscaling cassandra cluster
Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: autoscaling cassandra cluster
That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: How to enable a Cassandra node to participate in multiple cluster
Hello Salih, As far as I'm aware a node can't be in two clusters. In the casdandra.yaml file you can only specify one cluster. The storage system and all the protocols would have to be modified so information about multiple clusters is passed around. I'm sure somebody else could give you more and accurate detail. If your saving on hardware then you could think about using docker or virtualisation , but you'll have problems with performance. A bit like the problems you get when you have small instances at Amazon. Thanks Jabbar Azam On 21 May 2014 19:07, Salih Kardan karda...@gmail.com wrote: Hello everyone, I want to use Cassandra cluster for some specific purpose across data centers. What I want to figure out is how can I enable a single Cassandra node to participate in multiple clusters at the same time? I googled it, however I could not find any use case of Cassandra as I mentioned above. Is this possible with the current architecture of Cassandra? Salih
Re: autoscaling cassandra cluster
Hello James, How do you alter your cassandra.yaml file with each nodes IP address? I want to use the scaling software(which I've not got yet) to create and destroy the GCE instances. I want to use fleet to deploy and undeploy the cassandra nodes inside the docker instances. I do realise I will have to run nodetool to add and remove the nodes from the cluster and also the node cleanup. Disclaimer: this is not a production system but something Im experimenting with in my own time. Thanks Jabbar Azam On 21 May 2014 15:51, James Horey j...@opencore.io wrote: If you're interested and/or need some Cassandra docker images let me know I'll shoot you a link. James Sent from my iPhone On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote: That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: autoscaling cassandra cluster
Hello Ben, I''m looking forward to reading the netflix links. Thanks :) Thanks Jabbar Azam On 21 May 2014 18:08, Ben Bromhead b...@instaclustr.com wrote: The mechanics for it are simple compared to figuring out when to scale, especially when you want to be scaling before peak load on your cluster (adding and removing nodes puts additional load on your cluster). We are currently building our own in-house solution for this for our customers. If you want to have a go at it yourself, this is a good starting point: http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html Most of this is fairly specific to Netflix, but an interesting read nonetheless. Datastax OpsCenter also provides capacity planning and forecasting and can provide an easy set of metrics you can make your scaling decisions on. http://www.datastax.com/what-we-offer/products-services/datastax-opscenter Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustrhttp://twitter.com/instaclustr | +61 415 936 359 On 21/05/2014, at 7:51 AM, James Horey j...@opencore.io wrote: If you're interested and/or need some Cassandra docker images let me know I'll shoot you a link. James Sent from my iPhone On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote: That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: idempotent counters
Thanks Aaron. I've mitigated this by removing the dependency on idempotent counters. But its good to know the limitations of counters. Thanks Jabbar Azam On 19 May 2014 08:36, Aaron Morton aa...@thelastpickle.com wrote: Does anybody else use another technique for achieving this idempotency with counters? The idempotency problem with counters has to do with what will happen when you get a timeout. If you reply the write there is a chance of the increment been applied twice. This is inherent in the current design. Cheers Aaron - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 9/05/2014, at 1:07 am, Jabbar Azam aja...@gmail.com wrote: Hello, Do people use counters when they want to have idempotent operations in cassandra? I have a use case for using a counter to check for a count of objects in a partition. If the counter is more than some value then the data in the partition is moved into two different partitions. I can't work out how to do this splitting and recover if a problem happens during modification of the counter. http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2explains that counters shouldn't be used if you want idempotency. I would agree, but the alternative is not very elegant. I would have to manully count the objects in a partition and then move the data and repeat the operation if something went wrong. It is less resource intensive to read a counter value to see if a partition needs splitting then to read all the objects in a partition. The counter value can be stored in its own table sorting in descending order of the counter value. Does anybody else use another technique for achieving this idempotency with counters? I'm using cassandra 2.0.7. Thanks Jabbar Azam
Re: CQL Datatype in Cassandra
Hello Techy Teck, Couldn't find any evidence on the datastax website but found this http://wiki.apache.org/cassandra/CassandraLimitations which I believe is correct. Thanks Jabbar Azam On 6 November 2013 20:19, Techy Teck comptechge...@gmail.com wrote: We are using CQL table like this - CREATE TABLE testing ( description text, last_modified_date timeuuid, employee_id text, value text, PRIMARY KEY (employee_name, last_modified_date) ) We have made description as text in the above table. I am thinking is there any limitations on text data type in CQL such as it can only have certain number of bytes and after that it will truncate? Any other limitations that I should be knowing? Should I use blob there?
Re: CQL Datatype in Cassandra
Forget. The text value can be upto 2GB in size, but in practice it will be less. Thanks Jabbar Azam On 6 November 2013 21:12, Jabbar Azam aja...@gmail.com wrote: Hello Techy Teck, Couldn't find any evidence on the datastax website but found this http://wiki.apache.org/cassandra/CassandraLimitations which I believe is correct. Thanks Jabbar Azam On 6 November 2013 20:19, Techy Teck comptechge...@gmail.com wrote: We are using CQL table like this - CREATE TABLE testing ( description text, last_modified_date timeuuid, employee_id text, value text, PRIMARY KEY (employee_name, last_modified_date) ) We have made description as text in the above table. I am thinking is there any limitations on text data type in CQL such as it can only have certain number of bytes and after that it will truncate? Any other limitations that I should be knowing? Should I use blob there?
Re: videos of 2013 summit
http://www.youtube.com/playlist?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU Thanks Jabbar Azam On 4 Jul 2013 18:17, S Ahmed sahmed1...@gmail.com wrote: Hi, Are the videos online anywhere for the 2013 summit?
Re: Cassandra driver performance question...
Hello tony, I couldnt reply earlier because I've been decorating over the weekend so have been a bit busy. Let me know what's happens. Out of couriosity why are you using and not a cql3 native driver? Thanks Jabbar Azam On 24 Jun 2013 00:32, Tony Anecito adanec...@yahoo.com wrote: Hi Jabbar, I was able to get the performance issue resolved by reusing the connection object. It will be interesting to see what happens when I use a connection pool from a app server. I still think it would be a good idea to have a minimal mode for metadata. It is rare I use metadata. Regards, -Tony *From:* Tony Anecito adanec...@yahoo.com *To:* user@cassandra.apache.org user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com *Sent:* Friday, June 21, 2013 9:33 PM *Subject:* Re: Cassandra driver performance question... Hi Jabbar, I think I know what is going on. I happened accross a change mentioned by the jdbc driver developers regarding metadata caching. Seems the metadata caching was moved from the connection object to the preparedStatement object. So I am wondering if the time difference I am seeing on the second preparedStatement object is because of the Metadata is cached then. So my question is how to test this theory? Is there a way to stop the metadata from coming accross from Cassandra? A 20x performance improvement would be nice to have. Thanks, -Tony *From:* Tony Anecito adanec...@yahoo.com *To:* user@cassandra.apache.org user@cassandra.apache.org *Sent:* Friday, June 21, 2013 8:56 PM *Subject:* Re: Cassandra driver performance question... Thanks Jabbar, I ran nodetool as suggested and it 0 latency for the row count I have. I also ran cli list command for the table hit by my JDBC perparedStatement and it was slow like 121msecs the first time I ran it and second time I ran it it was 40msecs versus jdbc call of 38msecs to start with unless I run it twice also and get 1.5-2.5msecs for executeQuery the second time the preparedStatement is called. I ran describe from cli for the table and it said caching is ALL which is correct. A real mystery and I need to understand better what is going on. Regards, -Tony *From:* Jabbar Azam aja...@gmail.com *To:* user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com *Sent:* Friday, June 21, 2013 3:32 PM *Subject:* Re: Cassandra driver performance question... Hello Tony, I would guess that the first queries data is put into the row cache and the filesystem cache. The second query gets the data from the row cache and or the filesystem cache so it'll be faster. If you want to make it consistently faster having a key cache will definitely help. The following advice from Aaron Morton will also help You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network or wait times. If the proxyhistograms is showing most requests running faster than your app says it's your app. http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E Thanks Jabbar Azam On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote: Hi All, I am using jdbc driver and noticed that if I run the same query twice the second time it is much faster. I setup the row cache and column family cache and it not seem to make a difference. I am wondering how to setup cassandra such that the first query is always as fast as the second one. The second one was 1.8msec and the first 28msec for the same exact paremeters. I am using preparestatement. Thanks!
Re: Cassandra driver performance question...
Hello Tony, This came out recently http://www.datastax.com/doc-source/developer/java-driver/index.html I can't vouch for performance but the documentation is ok and it works. I'm using it on a side project myself. There is also astyanax by netflix and it also supports CQL 3 https://github.com/Netflix/astyanax/wiki/Getting-Started Thanks Jabbar Azam On 24 June 2013 15:34, Tony Anecito adanec...@yahoo.com wrote: Hi Jabbar, I am using JDBC driver because almost no examples exist about what you mention. Even most of the JDBC examples I find do not work because they are incomplete or out of date. If you have a good reference about what you mentioned I can try it. As I menioned I got selects to work now I am trying to get inserts to work via JDBC. Running into issues there also but I will work at it till I get them to work. Regards, -Tony *From:* Jabbar Azam aja...@gmail.com *To:* user@cassandra.apache.org *Cc:* Tony Anecito adanec...@yahoo.com *Sent:* Monday, June 24, 2013 3:26 AM *Subject:* Re: Cassandra driver performance question... Hello tony, I couldnt reply earlier because I've been decorating over the weekend so have been a bit busy. Let me know what's happens. Out of couriosity why are you using and not a cql3 native driver? Thanks Jabbar Azam On 24 Jun 2013 00:32, Tony Anecito adanec...@yahoo.com wrote: Hi Jabbar, I was able to get the performance issue resolved by reusing the connection object. It will be interesting to see what happens when I use a connection pool from a app server. I still think it would be a good idea to have a minimal mode for metadata. It is rare I use metadata. Regards, -Tony *From:* Tony Anecito adanec...@yahoo.com *To:* user@cassandra.apache.org user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com *Sent:* Friday, June 21, 2013 9:33 PM *Subject:* Re: Cassandra driver performance question... Hi Jabbar, I think I know what is going on. I happened accross a change mentioned by the jdbc driver developers regarding metadata caching. Seems the metadata caching was moved from the connection object to the preparedStatement object. So I am wondering if the time difference I am seeing on the second preparedStatement object is because of the Metadata is cached then. So my question is how to test this theory? Is there a way to stop the metadata from coming accross from Cassandra? A 20x performance improvement would be nice to have. Thanks, -Tony *From:* Tony Anecito adanec...@yahoo.com *To:* user@cassandra.apache.org user@cassandra.apache.org *Sent:* Friday, June 21, 2013 8:56 PM *Subject:* Re: Cassandra driver performance question... Thanks Jabbar, I ran nodetool as suggested and it 0 latency for the row count I have. I also ran cli list command for the table hit by my JDBC perparedStatement and it was slow like 121msecs the first time I ran it and second time I ran it it was 40msecs versus jdbc call of 38msecs to start with unless I run it twice also and get 1.5-2.5msecs for executeQuery the second time the preparedStatement is called. I ran describe from cli for the table and it said caching is ALL which is correct. A real mystery and I need to understand better what is going on. Regards, -Tony *From:* Jabbar Azam aja...@gmail.com *To:* user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com *Sent:* Friday, June 21, 2013 3:32 PM *Subject:* Re: Cassandra driver performance question... Hello Tony, I would guess that the first queries data is put into the row cache and the filesystem cache. The second query gets the data from the row cache and or the filesystem cache so it'll be faster. If you want to make it consistently faster having a key cache will definitely help. The following advice from Aaron Morton will also help You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network or wait times. If the proxyhistograms is showing most requests running faster than your app says it's your app. http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E Thanks Jabbar Azam On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote: Hi All, I am using jdbc driver and noticed that if I run the same query twice the second time it is much faster. I setup the row cache and column family cache and it not seem to make a difference. I am wondering how to setup cassandra such that the first query is always as fast as the second one. The second one was 1.8msec and the first 28msec for the same exact paremeters. I am using preparestatement. Thanks!
Re: Cassandra terminates with OutOfMemory (OOM) error
Hello Mohammed, You should increase the heap space. You should also tune the garbage collection so young generation objects are collected faster, relieving pressure on heap We have been using jdk 7 and it uses G1 as the default collector. It does a better job than me trying to optimise the JDK 6 GC collectors. Bear in mind though that the OS will need memory, so will the row cache and the filing system. Although memory usage will depend on the workload of your system. I'm sure you'll also get good advice from other members of the mailing list. Thanks Jabbar Azam On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.com wrote: We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data using Astyanax. If our application tries to read 80,000 columns each from 10 or more rows at the same time, some of the nodes run out of heap space and terminate with OOM error. Here is the error message: ** ** java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126) at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132) at org.apache.cassandra.db.Table.getRow(Table.java:355) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ** ** ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space at java.lang.Long.toString(Long.java:269) at java.lang.Long.toString(Long.java:764) at org.apache.cassandra.dht.Murmur3Partitioner$1.toString(Murmur3Partitioner.java:171) at org.apache.cassandra.service.StorageService.describeRing(StorageService.java:1068) at org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:1192) at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3766) at org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.getResult(Cassandra.java:3754) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ** ** The data in each column is less than 50 bytes. After adding all the column overheads (column name + metadata), it should not be more than 100 bytes. So reading 80,000 columns from 10 rows each means that we are reading 80,000 * 10 * 100 = 80 MB of data. It is large, but not large enough to fill up the 1.8 GB heap. So I wonder why the heap is getting full. If the data request is too big to fill
Re: Cassandra driver performance question...
Hello Tony, I would guess that the first queries data is put into the row cache and the filesystem cache. The second query gets the data from the row cache and or the filesystem cache so it'll be faster. If you want to make it consistently faster having a key cache will definitely help. The following advice from Aaron Morton will also help You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network or wait times. If the proxyhistograms is showing most requests running faster than your app says it's your app. http://mail-archives.apache.org/mod_mbox/cassandra-user/201301.mbox/%3ce3741956-c47c-4b43-ad99-dad8afc3a...@thelastpickle.com%3E Thanks Jabbar Azam On 21 June 2013 21:29, Tony Anecito adanec...@yahoo.com wrote: Hi All, I am using jdbc driver and noticed that if I run the same query twice the second time it is much faster. I setup the row cache and column family cache and it not seem to make a difference. I am wondering how to setup cassandra such that the first query is always as fast as the second one. The second one was 1.8msec and the first 28msec for the same exact paremeters. I am using preparestatement. Thanks!
Re: CQL3 Driver, DESCRIBE
Hello Joe, I would use cqlsh and run table in there I'm not sure why you want to run that from the driver! Thanks Jabbar Azam On 9 June 2013 23:49, Joe Greenawalt joe.greenaw...@gmail.com wrote: Hi, I was playing around with the datastax driver today, and I wanted to call DESCRIBE TABLE ;. But got a syntax error: line 1:0 no viable alternative at input 'describe'. Is that functionality just not implemented in the 1.0 driver? If that's true: Does anyone know if its planned? Is there another way to get a description of the table? If it's not true, does anyone know where I could be doing something wrong? I have a good connection, and I'm simply running session.execute(DESCRIBE TABLE {keyspaceName}.{tableName};); Thanks, Joe
Re: CQL3 Driver, DESCRIBE
Oops I meant describe table ... Thanks Jabbar Azam On 10 June 2013 00:16, Jabbar Azam aja...@gmail.com wrote: Hello Joe, I would use cqlsh and run table in there I'm not sure why you want to run that from the driver! Thanks Jabbar Azam On 9 June 2013 23:49, Joe Greenawalt joe.greenaw...@gmail.com wrote: Hi, I was playing around with the datastax driver today, and I wanted to call DESCRIBE TABLE ;. But got a syntax error: line 1:0 no viable alternative at input 'describe'. Is that functionality just not implemented in the 1.0 driver? If that's true: Does anyone know if its planned? Is there another way to get a description of the table? If it's not true, does anyone know where I could be doing something wrong? I have a good connection, and I'm simply running session.execute(DESCRIBE TABLE {keyspaceName}.{tableName};); Thanks, Joe
Re: Is there anyone who implemented time range partitions with column families?
Hello Cem, You can get a similar effect by specifying a TTL value for data you save to a table. If the data becomes older than the TTL value then it will automatically be deleted by C* Thanks Jabbar Azam On 29 May 2013 17:01, cem cayiro...@gmail.com wrote: Thank you very much for the fast answer. Does playORM use different column families for each partition in Cassandra? Cem On Wed, May 29, 2013 at 5:30 PM, Jeremy Powell jeremym.pow...@gmail.comwrote: Cem, yes, you can do this with C*, though you have to handle the logic yourself (other libraries might do this for you, seen the dev of playORM discuss some things which might be similar). We use Astyanax and programmatically create CFs based on a time period of our choosing that makes sense for our system, programmatically drop CFs if/when they are outside a certain time period (rather than using C*'s TTL), and write data to the different CFs as needed. ~Jeremy On Wed, May 29, 2013 at 8:36 AM, cem cayiro...@gmail.com wrote: Hi All, I used time range partitions 5 years ago with MySQL to clean up data much faster. I had a big FACT table with time range partitions and it was very is to drop old partitions (with archiving) and do some saving on disk. Has anyone implemented such a thing in Cassandra? It would be great if we have that in Cassandra. Best Regards, Cem.
Re: Compaction causing OutOfHeap
Hello, I've notice in an earlier 1.2.x that if I had a compaction throughput throttle some of the nodes would give an out of memory error only if I was inserting data for more than 10 hours continuosly. The work around was to switch off compaction throttling. This was in a test environment doing lots of inserts so switching off compaction throttling was ok. Thanks Jabbar Azam On 27 May 2013 04:29, John Watson j...@disqus.com wrote: Having (2) 1.2.5 nodes constantly crashing due to OutOfHeap errors. It always happens when the same large compaction is about to finish (they re-run the same compaction after restarting.) An indicator is CMS GC time of 3-5s (and the many related problems felt throughout the rest of the cluster)
Look table structuring advice
Hello, I want to create a simple table holding user roles e.g. create table roles ( name text, primary key(name) ); If I want to get a list of roles for some admin tool I can use the following CQL3 select * from roles; When a new name is added it will be stored on a different host and doing a select * is going to be inefficient because the table will be stored across the cluster and each node will respond. The number of roles may be less than or just greater than a dozen. I'm not sure if I'm storing the roles correctly. The other thing I'm thinking about is that when I've read the roles once then I can cache them. Thanks Jabbar Azam
Re: Look table structuring advice
I never thought about using a synthetic key, but in this instance with about a dozen rows it's probably ok. Thanks for your great idea. Where did you read about the synthetic key idea? I've not come across it before. Thanks Jabbar Azam On 4 May 2013 19:30, Dave Brosius dbros...@mebigfatguy.com wrote: if you want to store all the roles in one row, you can do create table roles (synthetic_key int, name text, primary key(synthetic_key, name)) with compact storage when inserting roles, just use the same key insert into roles (synthetic_key, name) values (0, 'Programmer'); insert into roles (synthetic_key, name) values (0, 'Tester'); and use select * from roles where synthetic_key = 0; (or some arbitrary key value you decide to use) the that data is stored on one node (and its replicas) of course if the number of roles grows to be large, you lose most of the value in having a cluster. On 05/04/2013 12:09 PM, Jabbar Azam wrote: Hello, I want to create a simple table holding user roles e.g. create table roles ( name text, primary key(name) ); If I want to get a list of roles for some admin tool I can use the following CQL3 select * from roles; When a new name is added it will be stored on a different host and doing a select * is going to be inefficient because the table will be stored across the cluster and each node will respond. The number of roles may be less than or just greater than a dozen. I'm not sure if I'm storing the roles correctly. The other thing I'm thinking about is that when I've read the roles once then I can cache them. Thanks Jabbar Azam
Re: cql query
Hello Sri, As far as I know you can if name and age are part of your partition key and timestamp is the cluster key e.g. create table columnfamily ( name varchar, age varchar, tstamp timestamp, partition key((name, age), tstamp) ); Thanks Jabbar Azam On 2 May 2013 11:45, Sri Ramya ramya.1...@gmail.com wrote: hi Can some body tell me is it possible to to do multiple query on cassandra like Select * from columnfamily where name='foo' and age ='21' and timestamp = 'unixtimestamp' ; Please tell me some guidence for these kind of queries Thank you
Re: Cassandra multi-datacenter
I'm not sure why you want to use public Ip's in the other data centre. You're cassandra nodes in the other datacentre will be accessible from the internet Personally I would use private IP addresses in the second data centre, on a different IP subnet. A VPN is your only solution if you want to keep your data private and unhackable, as it's tunneling it's way through the internet A slow network connection will mean your data is not in sync in both datacentres unless you explicitly specify quorum as your consisteny level in your mutation requests but your database throughput will be affected by this. You bandwidth to the second datacentre and the quantity of your mutation requests will dictate how long it will take the second datacentre to get in sync with the primary datacentre. I've probably missed something but there are plenty of intelligent people in this mailing list to fill the blanks :) Thanks Jabbar Azam On 2 May 2013 20:28, Daning Wang dan...@netseer.com wrote: Hi all, We are deploying Cassandra on two data centers. there is slower network connection between data centers. Looks casandra should use internal ip to communicate with nodes in the same data center, and public ip to talk to nodes in other data center. We know VPN is a solution, but want to know if there is other idea. Thanks in advance, Daning
Re: Any experience of 20 node mini-itx cassandra cluster
I already have thanks. I'll do the tests with the hardware arrives. Thanks Jabbar Azam On 16 April 2013 22:27, aaron morton aa...@thelastpickle.com wrote: Can't we use LCS? Do some reading and some tests… http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra http://www.datastax.com/dev/blog/when-to-use-leveled-compaction Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/04/2013, at 10:44 PM, Jabbar Azam aja...@gmail.com wrote: I know the SSD's are a bit small but they should be enough for our application. Out test data is 1.6 TB(including replication of rf=3). Can't we use LCS? This will give us more space at the expensive of more I/O but SSD's have loads of I/Os. Thanks Jabbar Azam On 14 April 2013 20:20, Jabbar Azam aja...@gmail.com wrote: Thanks Aaron. Thanks Jabbar Azam On 14 April 2013 19:39, aaron morton aa...@thelastpickle.com wrote: That's better. The SSD size is a bit small, and be warned that you will want to leave 50Gb to 100GB free to allow room for compaction (using the default size tiered). On the ram side you will want to run about 4GB (assuming cass 1.2) for the JVM the rest can be off heap Cassandra structures. This may not leave too much free space for the os page cache, but SSD may help there. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote: What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and 256gb ssds? I know it will depend on our workload but will be better than a dual core CPU. I think Jabbar Azam On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote: Duel core not the greatest you might run into GC issues before you run out of IO from your ssd devices. Also cassandra has other concurrency settings that are tuned roughly around the number of processors/cores. It is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young gen garbage managing lots of sockets whatever. On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote: That's my guess. My colleague is still looking at CPU's so I'm hoping he can get quad core CPU's for the servers. Thanks Jabbar Azam On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote: If you have not seen it already, checkout the Netflix blog post on their performance testing of AWS SSD instances. http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html My guess, based on very little experience, is that you will be CPU bound. On 04/12/2013 03:05 AM, Jabbar Azam wrote: Hello, I'm going to be building a 20 node cassandra cluster in one datacentre. The spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD, 16GB RAM and two nics. Has anybody done any performance testing with this setup or have any gotcha's I should be aware of wrt to the hardware? I do realise the CPU is fairly low computational power but I'm going to assume the system is going to be IO bound hence the RAM and SSD's. Thanks Jabbar Azam -- *Colin Blower* *Software Engineer* Barracuda Networks Inc. +1 408-342-5576 (o)
Re: MySQL Cluster performing faster than Cassandra cluster on single table
MySQL cluster also has the index in ram. So with lots of rows the ram becomes a limiting factor. That's what my colleague found and hence why were sticking with Cassandra. On 16 Apr 2013 21:05, horschi hors...@gmail.com wrote: Ah, I see, that makes sense. Have you got a source for the storing of hundreds of gigabytes? And does Cassandra not store anything in memory? It stores bloom filters and index-samples in memory. But they are much smaller than the actual data and they can be configured. Yeah, my dataset is small at the moment - perhaps I should have chosen something larger for the work I'm doing (University dissertation), however, it is far too late to change now! On paper mysql-cluster looks great. But in daily use its not as nice as Cassandra (where you have machines dying, networks splitting, etc.). cheers, Christian
Re: Any experience of 20 node mini-itx cassandra cluster
I know the SSD's are a bit small but they should be enough for our application. Out test data is 1.6 TB(including replication of rf=3). Can't we use LCS? This will give us more space at the expensive of more I/O but SSD's have loads of I/Os. Thanks Jabbar Azam On 14 April 2013 20:20, Jabbar Azam aja...@gmail.com wrote: Thanks Aaron. Thanks Jabbar Azam On 14 April 2013 19:39, aaron morton aa...@thelastpickle.com wrote: That's better. The SSD size is a bit small, and be warned that you will want to leave 50Gb to 100GB free to allow room for compaction (using the default size tiered). On the ram side you will want to run about 4GB (assuming cass 1.2) for the JVM the rest can be off heap Cassandra structures. This may not leave too much free space for the os page cache, but SSD may help there. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote: What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and 256gb ssds? I know it will depend on our workload but will be better than a dual core CPU. I think Jabbar Azam On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote: Duel core not the greatest you might run into GC issues before you run out of IO from your ssd devices. Also cassandra has other concurrency settings that are tuned roughly around the number of processors/cores. It is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young gen garbage managing lots of sockets whatever. On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote: That's my guess. My colleague is still looking at CPU's so I'm hoping he can get quad core CPU's for the servers. Thanks Jabbar Azam On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote: If you have not seen it already, checkout the Netflix blog post on their performance testing of AWS SSD instances. http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html My guess, based on very little experience, is that you will be CPU bound. On 04/12/2013 03:05 AM, Jabbar Azam wrote: Hello, I'm going to be building a 20 node cassandra cluster in one datacentre. The spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD, 16GB RAM and two nics. Has anybody done any performance testing with this setup or have any gotcha's I should be aware of wrt to the hardware? I do realise the CPU is fairly low computational power but I'm going to assume the system is going to be IO bound hence the RAM and SSD's. Thanks Jabbar Azam -- *Colin Blower* *Software Engineer* Barracuda Networks Inc. +1 408-342-5576 (o)
Re: Any experience of 20 node mini-itx cassandra cluster
Thanks Aaron. Thanks Jabbar Azam On 14 April 2013 19:39, aaron morton aa...@thelastpickle.com wrote: That's better. The SSD size is a bit small, and be warned that you will want to leave 50Gb to 100GB free to allow room for compaction (using the default size tiered). On the ram side you will want to run about 4GB (assuming cass 1.2) for the JVM the rest can be off heap Cassandra structures. This may not leave too much free space for the os page cache, but SSD may help there. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13/04/2013, at 4:47 PM, Jabbar Azam aja...@gmail.com wrote: What about using quad core athlon x4 740 3.2 GHz with 8gb of ram and 256gb ssds? I know it will depend on our workload but will be better than a dual core CPU. I think Jabbar Azam On 13 Apr 2013 01:05, Edward Capriolo edlinuxg...@gmail.com wrote: Duel core not the greatest you might run into GC issues before you run out of IO from your ssd devices. Also cassandra has other concurrency settings that are tuned roughly around the number of processors/cores. It is not uncommon to see 4-6 cores of cpu (600 % in top dealing with young gen garbage managing lots of sockets whatever. On Fri, Apr 12, 2013 at 12:02 PM, Jabbar Azam aja...@gmail.com wrote: That's my guess. My colleague is still looking at CPU's so I'm hoping he can get quad core CPU's for the servers. Thanks Jabbar Azam On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote: If you have not seen it already, checkout the Netflix blog post on their performance testing of AWS SSD instances. http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html My guess, based on very little experience, is that you will be CPU bound. On 04/12/2013 03:05 AM, Jabbar Azam wrote: Hello, I'm going to be building a 20 node cassandra cluster in one datacentre. The spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD, 16GB RAM and two nics. Has anybody done any performance testing with this setup or have any gotcha's I should be aware of wrt to the hardware? I do realise the CPU is fairly low computational power but I'm going to assume the system is going to be IO bound hence the RAM and SSD's. Thanks Jabbar Azam -- *Colin Blower* *Software Engineer* Barracuda Networks Inc. +1 408-342-5576 (o)
Re: Anyway To Query Just The Partition Key?
With your example you can do an equality search with surname and city and then use in with country Eg. Select * from yourtable where surname=blah and city=blah blah and country in (country1, country2) Hope that helps Jabbar Azam On 13 Apr 2013 07:06, Gareth Collins gareth.o.coll...@gmail.com wrote: Hello, If I have a cql3 table like this (I don't have a table with this data - this is just for example): create table ( surname text, city text, country text, event_id timeuuid, data text, PRIMARY KEY ((surname, city, country),event_id)); there is no way of (easily) getting the set (or a subset) of partition keys, is there (i.e. surname/city/country)? If I want easy access to do queries to get a subset of the partition keys, I have to create another table? I am assuming yes but just making sure I am not missing something obvious here. thanks in advance, Gareth
Any experience of 20 node mini-itx cassandra cluster
Hello, I'm going to be building a 20 node cassandra cluster in one datacentre. The spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD, 16GB RAM and two nics. Has anybody done any performance testing with this setup or have any gotcha's I should be aware of wrt to the hardware? I do realise the CPU is fairly low computational power but I'm going to assume the system is going to be IO bound hence the RAM and SSD's. Thanks Jabbar Azam
Re: Any experience of 20 node mini-itx cassandra cluster
That's my guess. My colleague is still looking at CPU's so I'm hoping he can get quad core CPU's for the servers. Thanks Jabbar Azam On 12 April 2013 16:48, Colin Blower cblo...@barracuda.com wrote: If you have not seen it already, checkout the Netflix blog post on their performance testing of AWS SSD instances. http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html My guess, based on very little experience, is that you will be CPU bound. On 04/12/2013 03:05 AM, Jabbar Azam wrote: Hello, I'm going to be building a 20 node cassandra cluster in one datacentre. The spec of the servers will roughly be dual core Celeron CPU, 256 GB SSD, 16GB RAM and two nics. Has anybody done any performance testing with this setup or have any gotcha's I should be aware of wrt to the hardware? I do realise the CPU is fairly low computational power but I'm going to assume the system is going to be IO bound hence the RAM and SSD's. Thanks Jabbar Azam -- *Colin Blower* *Software Engineer* Barracuda Networks Inc. +1 408-342-5576 (o)
Re: multiple Datacenter values in PropertyFileSnitch
Hello, I'm not an expert but I don't think you can do what you want. The way to separate data for applications on the same cluster is to use different tables for different applications or use multiple keyspaces, a keyspace per application. The replication factor you specify for each keyspace specifies how many copies of the data are stored in each datacenter. You can't specify that data for a particular application is stored on a specific node, unless that node is in its own cluster. I think of a cassandra cluster as a shared resource where all the applications have access to all the nodes in the cluster. Thanks Jabbar Azam On 11 April 2013 14:13, Matthias Zeilinger matthias.zeilin...@bwinparty.com wrote: Hi, ** ** I would like to create big cluster for many applications. Within this cluster I would like to separate the data for each application, which can be easily done via different virtual datacenters and the correct replication strategy. What I would like to know, if I can specify for 1 node multiple values in the PropertyFileSnitch configuration, so that I can use 1 node for more applications? For example: 6 nodes: 3 for App A 3 for App B 4 for App C ** ** I want to have such a configuration: Node 1 – DC-A DC-C Node 2 – DC-B DC-C Node 3 – DC-A DC-C Node 4 – DC-B DC-C Node 5 – DC-A Node 6 – DC-B ** ** Is this possible or does anyone have another solution for this? ** ** ** ** Thx br matthias
Re: Two Cluster each with 12 nodes- Cassandra database
Hello, I don't know what pelops is. I'm not sure why you want two clusters. I would have two clusters if I want to have data stored on totally separate servers for perhaps security reasons. If you are going to have the servers in one location then you might as well have one cluster. You'll have the maximum aggregate io of all the servers. If you're thinking of doing analytics as well then you can create two virtual datacentres. One for realtime inserts and reads and the second for analytics. You could have have and 16 /8 server split. Obviously you'll have to work out what the optimum split is for your workload. Not sure if I've answered your question... On 11 Apr 2013 18:51, Raihan Jamal jamalrai...@gmail.com wrote: Folks, Any thoughts on this? I am still in the learning process. So any guidance will be of great help. *Raihan Jamal* On Wed, Apr 10, 2013 at 10:39 PM, Raihan Jamal jamalrai...@gmail.comwrote: I have started working on a project in which I am using `Cassandra database`. Our production DBA's have setup `two cluster` and each cluster will have `12 nodes`. I will be using `Pelops client` to read the data from Cassandra database. Now I am thinking what's the best way to create `Cluster` using `Pelops client` like how many nodes I should add while creating cluster? My understanding was to create the cluster with all the `24 nodes` as I will be having two cluster each with 12 nodes? This is the right approach? *If not, then how we decide what nodes (from each cluster) I should add while creating the cluster using Pelops client? * String[] nodes = cfg.getStringArray(cassandra.servers); int port = cfg.getInt(cassandra.port); boolean dynamicND = true; // dynamic node discovery Config casconf = new Config(port, true, 0); Cluster cluster = new Cluster(nodes, casconf, dynamicND); Pelops.addPool(Const.CASSANDRA_POOL, cluster, Const.CASSANDRA_KS); Can anyone help me out with this? Any help will be appreciated. **
Re: Backup strategies in a multi DC cluster
Thank you for your feedback. I'll speak to the dev guys and come up with something appropriate. On 26 Mar 2013 17:51, aaron morton aa...@thelastpickle.com wrote: Assume you have four nodes and a snapshot is taken. The following day if a node goes down and data is corrupt through user error then how do you use the previouus nights snapshots? Not sure what is corrupt, the snapshot/backup or the data is incorrect through application error. Would you replace the faulty node first and then restore last nights snapshot? What happens if you don't have a replacement node? You won't be able to restore last nights snapshot. You would need to stop the entire cluster, and restore the snapshots on all nodes. If you restored the snapshot on just one node, new or old HW, it would have some data with an older timestamp than the other nodes. Cassandra would see this as an inconsistency, that the restored node missed some writes, and resolve the consistency be the most recent values. However if a virtual datacenter consisting of a backup node is used then the backup node could be used regardless of the number of nodes in the datacentre. It depends on the failure scenario and what you are trying to protect against. If you have 4 nodes and one node fails the best thing to do is start a new node and let cassandra stream the data from the other nodes. The new node could have the same token as the previous failed node. So long as the /var/lib/cassandra/data/system dir is empty (and the node is not a seed) it will join the cluster and ask the others for data. If you want to ensure availability then consider bigger clusters, e.g. 6 nodes with rf 3 allows you to lose up to 2 nodes and stay up. Or a higher RF. (see http://thelastpickle.com/2011/06/13/Down-For-Me/) It's tricky to protect agains application error creating bad data using just backups. You may need to look at how you can replay events in your system and consider which parts of your data model should be directly mutates and which should be indirectly mutated by recording changes in another part of the model. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 25/03/2013, at 8:19 AM, Jabbar Azam aja...@gmail.com wrote: Thanks Aaron. I have a hypothetical question. Assume you have four nodes and a snapshot is taken. The following day if a node goes down and data is corrupt through user error then how do you use the previouus nights snapshots? Would you replace the faulty node first and then restore last nights snapshot? What happens if you don't have a replacement node? You won't be able to restore last nights snapshot. However if a virtual datacenter consisting of a backup node is used then the backup node could be used regardless of the number of nodes in the datacentre. Would there be any disadvantages approach? Sorry for the questions I want to understand all the options. On 24 Mar 2013 17:45, aaron morton aa...@thelastpickle.com wrote: There are advantages and disadvantages in both approaches. What are people doing in their production systems? Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to get things off node. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 23/03/2013, at 4:37 AM, Jabbar Azam aja...@gmail.com wrote: Hello, I've been experimenting with cassandra for quite a while now. It's time for me to look at backups but I'm not sure what the best practice is. I want to be able to recover the data to a point in time before any user or software errors. We will have two datacentres with 4 servers and RF=3. Each datacentre will have at most 1.6 TB(includes replication, LZ4 compression, using test data) of data. That is ten years of data after which we will start purging. This amounts to about 400MB of data generation per day. I've read about users doing snapshots of individual nodes to S3(Netflix) and I've read about creating virtual datacentres ( http://www.datastax.com/dev/blog/multi-datacenter-replication) where each virtual datacentre contains a backup node. There are advantages and disadvantages in both approaches. What are people doing in their production systems? -- Thanks Jabbar Azam
Re: Recovering from a faulty cassandra node
nodetool cleanup took about 23.5 hours on each node(did this in parallel). started the nodetool cleanup 20:53 on March 22 and it's still running (10:08 25 March) The RF = 3. The load on each node is 490 GB, 491 GB, 323GB, 476GB I think I read some that removenode is faster the more nodes there are in the cluster. My next email will be the last in the thread. I thought the info might be useful to other people in the community. On 21 March 2013 21:59, Jabbar Azam aja...@gmail.com wrote: nodetool cleanup command removes keys which can be deleted from the node the command is run. So I'm assuming I can run nodetool cleanup on all the old nodes in parallel. Wouldn't do this on a live cluster as it's I/O intensive on each node. On 21 March 2013 17:26, Jabbar Azam aja...@gmail.com wrote: Can I do a multiple node nodetool cleanup on my test cluster? On 21 Mar 2013 17:12, Jabbar Azam aja...@gmail.com wrote: All cassandra-topology.properties are the same. The node add appears to be successful. I can see it using nodetool status. I'm doing a node cleanup on the old nodes and then will do a node remove, to remove the old node. The actual node join took about 6 hours. The wiped node(now new node) has about 324 GB of files in /var/lib/cassandra On 21 March 2013 16:58, aaron morton aa...@thelastpickle.com wrote: Not sure if I needed to change cassandra-topology.properties file on the existing nodes. If you are using the PropertyFileSnitch all nodes need to have the same cassandra-topology.properties file. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 1:34 AM, Jabbar Azam aja...@gmail.com wrote: I've added the node with a different IP address and after disabling the firewall data is being streamed from the existing nodes to the wiped node. I'll do a cleanup, followed by remove node once it's done. I've also added the new node to the existing nodes' cassandra-topology.properties file and restarted them. I also found I had iptables switched on and couldn't understand why the wiped node couldn't see the cluster. Not sure if I needed to change cassandra-topology.properties file on the existing nodes. On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote: Do I use removenode before adding the reinstalled node or after? On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote: In 1.2, you may want to use the nodetool removenode if your server i broken or unreachable, else I guess nodetool decommission remains the good way to remove a node. ( http://www.datastax.com/docs/1.2/references/nodetool) When this node is out, rm -rf /yourpath/cassandra/* on this serveur, change the configuration if needed (not sure about the auto_bootstrap param) and start Cassandra on that node again. It should join the ring as a new node. Good luck. 2013/3/19 Hiller, Dean dean.hil...@nrel.gov Since you cleared out that node, it IS the replacement node. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto: user@cassandra.apache.org user@cassandra.apache.orgmailto: user@cassandra.apache.org Date: Tuesday, March 19, 2013 9:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Recovering from a faulty cassandra node Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto: dean.hil...@nrel.gov wrote: I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so streaming takes affect. I would however test that all out in QA to make sure it works and if you have QUOROM reads/writes a good part of that test would be to take node X down after your node Y is back in the cluster to make sure reads/writes are working on the node you fixed…..you just need to make sure node X shares one of the token ranges of node Y AND your writes/reads are in that token range. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto: aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.org user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.org Date: Tuesday, March 19, 2013 8:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user
Re: cfhistograms
This also has a good description of how to interpret the results http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ On 25 March 2013 16:36, Brian Tarbox tar...@cabotresearch.com wrote: I think we all go through this learning curve. Here is the answer I gave last time this question was asked: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0 On Mon, Mar 25, 2013 at 11:52 AM, Kanwar Sangha kan...@mavenir.comwrote: Can someone explain how to read the cfhistograms o/p ? ** ** [root@db4 ~]# nodetool cfhistograms usertable data usertable/data histograms Offset SSTables Write Latency Read Latency Row Size Column Count 12857444 4051 0 0 342711 26355104 27021 0 0 201313 32579941 61600 0 0 130489 4 374067119286 0 0 91378 5 9175210934 0 0 68548 6 0321098 0 0 54479 7 0476677 0 0 45427 8 0734846 0 0 38814 10 0 2867967 4 0 65512 12 0 536684422 0 59967 14 0 691143136 0 63980 17 0 10155740 127 0115714 20 0 7432318 302 0138759 24 0 5231047 969 0193477 29 0 2368553 2790 0209998 35 0859591 4385 0204751 42 0456978 3790 0214658 50 0306084 2465 0151838 60 0223202 2158 0 40277 72 0122906 2896 0 1735 ** ** ** ** Thanks Kanwar ** ** -- Thanks Jabbar Azam
Re: Backup strategies in a multi DC cluster
Thanks Aaron. I have a hypothetical question. Assume you have four nodes and a snapshot is taken. The following day if a node goes down and data is corrupt through user error then how do you use the previouus nights snapshots? Would you replace the faulty node first and then restore last nights snapshot? What happens if you don't have a replacement node? You won't be able to restore last nights snapshot. However if a virtual datacenter consisting of a backup node is used then the backup node could be used regardless of the number of nodes in the datacentre. Would there be any disadvantages approach? Sorry for the questions I want to understand all the options. On 24 Mar 2013 17:45, aaron morton aa...@thelastpickle.com wrote: There are advantages and disadvantages in both approaches. What are people doing in their production systems? Generally a mix of snapshots+rsync or https://github.com/synack/tablesnap to get things off node. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 23/03/2013, at 4:37 AM, Jabbar Azam aja...@gmail.com wrote: Hello, I've been experimenting with cassandra for quite a while now. It's time for me to look at backups but I'm not sure what the best practice is. I want to be able to recover the data to a point in time before any user or software errors. We will have two datacentres with 4 servers and RF=3. Each datacentre will have at most 1.6 TB(includes replication, LZ4 compression, using test data) of data. That is ten years of data after which we will start purging. This amounts to about 400MB of data generation per day. I've read about users doing snapshots of individual nodes to S3(Netflix) and I've read about creating virtual datacentres ( http://www.datastax.com/dev/blog/multi-datacenter-replication) where each virtual datacentre contains a backup node. There are advantages and disadvantages in both approaches. What are people doing in their production systems? -- Thanks Jabbar Azam
Backup strategies in a multi DC cluster
Hello, I've been experimenting with cassandra for quite a while now. It's time for me to look at backups but I'm not sure what the best practice is. I want to be able to recover the data to a point in time before any user or software errors. We will have two datacentres with 4 servers and RF=3. Each datacentre will have at most 1.6 TB(includes replication, LZ4 compression, using test data) of data. That is ten years of data after which we will start purging. This amounts to about 400MB of data generation per day. I've read about users doing snapshots of individual nodes to S3(Netflix) and I've read about creating virtual datacentres ( http://www.datastax.com/dev/blog/multi-datacenter-replication) where each virtual datacentre contains a backup node. There are advantages and disadvantages in both approaches. What are people doing in their production systems? -- Thanks Jabbar Azam
Re: cannot start Cassandra on Windows7
Hello Marina, I've downloaded a fresh copy of v1.2.3 and it's running fine on my Windows 7 64 bit PC. I am using jdk 1.6.0 u29 64 bit. I have local admin permissions to my PC. On 22 March 2013 15:36, Marina ppi...@yahoo.com wrote: Hi, I have downloaded apache-cassandra-1.2.3-bin.tar.gz and un-zipped it on my Windows7 machine (I did not find a Windows-specific distributable...). Then, I tried to start Cassandra as following and got an error: C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f Starting Cassandra Server Exception in thread main java.lang.ExceptionInInitializerError Caused by: java.lang.RuntimeException: Couldn't figure out log4j configuration: log4j-server.properties at org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo n.java:81) at org.apache.cassandra.service.CassandraDaemon.clinit (CassandraDaemon .java:57) Could not find the main class: org.apache.cassandra.service.CassandraDaemon. Pr ogram will exit. C:\Marina\Tools\apache-cassandra-1.2.3\bin It looks similar to the Cassandra issue that was already fixed: https://issues.apache.org/jira/browse/CASSANDRA-2383 however I am still getting this error I am an Administrator on my machine, and have access to all files in the apache- cassandra-1.2.3\conf dir, including the log4j ones. Do I need to configure anything else on Winows ? I did not find any Windows- specific installation/setup/startup instructions - if there are such documents somewhere, please let me know! Thanks, Marina In case it helps, I have added echo of CASSANDRA_CLASSPATH: C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f Starting Cassandra Server CASSANDRA_CLASSPATH=C:\Marina\Tools\DataStax Community\apache-cassandra\conf; C:\Marina\Tools\DataStax Community\apache-cassandra\lib\antlr-3.2.jar;C:\Marin a\Tools\DataStax Community\apache-cassandra\lib\apache-cassandra-1.2.2.jar;C:\ Marina\Tools\DataStax Community\apache-cassandra\lib\apache-cassandra-clientutil -1.2.2.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\apache-cass andra-thrift-1.2.2.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib \avro-1.4.0-fixes.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\ avro-1.4.0-sources-fixes.jar;C:\Marina\Tools\DataStax Community\apache-cassand ra\lib\commons-cli-1.1.jar;C:\Marina\Tools\DataStax Community\apache-cassandra \lib\commons-codec-1.2.jar;C:\Marina\Tools\DataStax Community\apache-cassandra \lib\commons-lang-2.6.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\ lib\compress-lzf-0.8.4.jar;C:\Marina\Tools\DataStax Community\apache-cassandra \lib\concurrentlinkedhashmap-lru-1.3.jar;C:\Marina\Tools\DataStax Community\ap ache-cassandra\lib\guava-13.0.1.jar;C:\Marina\Tools\DataStax Community\apache- cassandra\lib\high-scale-lib-1.1.2.jar;C:\Marina\Tools\DataStax Community\apac he-cassandra\lib\jackson-core-asl-1.9.2.jar;C:\Marina\Tools\DataStax Community \apache-cassandra\lib\jackson-mapper-asl-1.9.2.jar;C:\Marina\Tools\DataStax Co mmunity\apache-cassandra\lib\jamm-0.2.5.jar;C:\Marina\Tools\DataStax Community \apache-cassandra\lib\jbcrypt-0.3m.jar;C:\Marina\Tools\DataStax Community\apac he-cassandra\lib\jline-1.0.jar;C:\Marina\Tools\DataStax Community\apache-cassa ndra\lib\json-simple-1.1.jar;C:\Marina\Tools\DataStax Community\apache-cassand ra\lib\libthrift-0.7.0.jar;C:\Marina\Tools\DataStax Community\apache-cassandra \lib\log4j-1.2.16.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\ lz4-1.1.0.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\metrics- core-2.0.3.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\netty-3 .5.9.Final.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\servlet -api-2.5-20081211.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\ slf4j-api-1.7.2.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\sl f4j-log4j12-1.7.2.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\ snakeyaml-1.6.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\snap py-java-1.0.4.1.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\lib\sn aptree-0.1.jar;C:\Marina\Tools\DataStax Community\apache-cassandra\build\class es\main;C:\Marina\Tools\DataStax Community\apache-cassandra\build\classes\thri ft Exception in thread main java.lang.ExceptionInInitializerError Caused by: java.lang.RuntimeException: Couldn't figure out log4j configuration: log4j-server.properties at org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo n.java:81) at org.apache.cassandra.service.CassandraDaemon.clinit(CassandraDaemon .java:57) Could not find the main class: org.apache.cassandra.service.CassandraDaemon. Pr ogram will exit. -- Thanks Jabbar Azam
Re: cannot start Cassandra on Windows7
Viktor, you're right. I didn't get any errors on my windows console but cassandra.yaml and log4j-server.properties need modifying. On 22 March 2013 15:44, Viktor Jevdokimov viktor.jevdoki...@adform.comwrote: You NEED to edit cassandra.yaml and log4j-server.properties paths before starting on Windows. There're a LOT of things to learn for starters. Google for Cassandra on Windows. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -Original Message- From: Marina [mailto:ppi...@yahoo.com] Sent: Friday, March 22, 2013 17:21 To: user@cassandra.apache.org Subject: cannot start Cassandra on Windows7 Hi, I have downloaded apache-cassandra-1.2.3-bin.tar.gz and un-zipped it on my Windows7 machine (I did not find a Windows-specific distributable...). Then, I tried to start Cassandra as following and got an error: C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f Starting Cassandra Server Exception in thread main java.lang.ExceptionInInitializerError Caused by: java.lang.RuntimeException: Couldn't figure out log4j configuration: log4j-server.properties at org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo n.java:81) at org.apache.cassandra.service.CassandraDaemon.clinit(CassandraDaemon .java:57) Could not find the main class: org.apache.cassandra.service.CassandraDaemon. Pr ogram will exit. C:\Marina\Tools\apache-cassandra-1.2.3\bin It looks similar to the Cassandra issue that was already fixed: https://issues.apache.org/jira/browse/CASSANDRA-2383 however I am still getting this error I am an Administrator on my machine, and have access to all files in the apache- cassandra-1.2.3\conf dir, including the log4j ones. Do I need to configure anything else on Winows ? I did not find any Windows- specific installation/setup/startup instructions - if there are such documents somewhere, please let me know! Thanks, Marina -- Thanks Jabbar Azam
Re: cannot start Cassandra on Windows7
Oops, I also had opscenter installed on my PC. My changes log4j-server.properties file log4j.appender.R.File=c:/var/log/cassandra/system.log cassandra.yaml file # directories where Cassandra should store data on disk. data_file_directories: - c:/var/lib/cassandra/data # commit log commitlog_directory: c:/var/lib/cassandra/commitlog # saved caches saved_caches_directory: c:/var/lib/cassandra/saved_caches* * I also added an environment variable for windows called CASSANDRA_HOME* * * * I needed to do this for one of my colleagues and now it's documented ;)* * * * On 22 March 2013 15:47, Jabbar Azam aja...@gmail.com wrote: Viktor, you're right. I didn't get any errors on my windows console but cassandra.yaml and log4j-server.properties need modifying. On 22 March 2013 15:44, Viktor Jevdokimov viktor.jevdoki...@adform.comwrote: You NEED to edit cassandra.yaml and log4j-server.properties paths before starting on Windows. There're a LOT of things to learn for starters. Google for Cassandra on Windows. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. -Original Message- From: Marina [mailto:ppi...@yahoo.com] Sent: Friday, March 22, 2013 17:21 To: user@cassandra.apache.org Subject: cannot start Cassandra on Windows7 Hi, I have downloaded apache-cassandra-1.2.3-bin.tar.gz and un-zipped it on my Windows7 machine (I did not find a Windows-specific distributable...). Then, I tried to start Cassandra as following and got an error: C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f Starting Cassandra Server Exception in thread main java.lang.ExceptionInInitializerError Caused by: java.lang.RuntimeException: Couldn't figure out log4j configuration: log4j-server.properties at org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo n.java:81) at org.apache.cassandra.service.CassandraDaemon.clinit(CassandraDaemon .java:57) Could not find the main class: org.apache.cassandra.service.CassandraDaemon. Pr ogram will exit. C:\Marina\Tools\apache-cassandra-1.2.3\bin It looks similar to the Cassandra issue that was already fixed: https://issues.apache.org/jira/browse/CASSANDRA-2383 however I am still getting this error I am an Administrator on my machine, and have access to all files in the apache- cassandra-1.2.3\conf dir, including the log4j ones. Do I need to configure anything else on Winows ? I did not find any Windows- specific installation/setup/startup instructions - if there are such documents somewhere, please let me know! Thanks, Marina -- Thanks Jabbar Azam -- Thanks Jabbar Azam
Re: Recovering from a faulty cassandra node
All cassandra-topology.properties are the same. The node add appears to be successful. I can see it using nodetool status. I'm doing a node cleanup on the old nodes and then will do a node remove, to remove the old node. The actual node join took about 6 hours. The wiped node(now new node) has about 324 GB of files in /var/lib/cassandra On 21 March 2013 16:58, aaron morton aa...@thelastpickle.com wrote: Not sure if I needed to change cassandra-topology.properties file on the existing nodes. If you are using the PropertyFileSnitch all nodes need to have the same cassandra-topology.properties file. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 1:34 AM, Jabbar Azam aja...@gmail.com wrote: I've added the node with a different IP address and after disabling the firewall data is being streamed from the existing nodes to the wiped node. I'll do a cleanup, followed by remove node once it's done. I've also added the new node to the existing nodes' cassandra-topology.properties file and restarted them. I also found I had iptables switched on and couldn't understand why the wiped node couldn't see the cluster. Not sure if I needed to change cassandra-topology.properties file on the existing nodes. On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote: Do I use removenode before adding the reinstalled node or after? On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote: In 1.2, you may want to use the nodetool removenode if your server i broken or unreachable, else I guess nodetool decommission remains the good way to remove a node. ( http://www.datastax.com/docs/1.2/references/nodetool) When this node is out, rm -rf /yourpath/cassandra/* on this serveur, change the configuration if needed (not sure about the auto_bootstrap param) and start Cassandra on that node again. It should join the ring as a new node. Good luck. 2013/3/19 Hiller, Dean dean.hil...@nrel.gov Since you cleared out that node, it IS the replacement node. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 9:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Recovering from a faulty cassandra node Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto: dean.hil...@nrel.gov wrote: I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so streaming takes affect. I would however test that all out in QA to make sure it works and if you have QUOROM reads/writes a good part of that test would be to take node X down after your node Y is back in the cluster to make sure reads/writes are working on the node you fixed…..you just need to make sure node X shares one of the token ranges of node Y AND your writes/reads are in that token range. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto: aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 8:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Recovering from a faulty cassandra node Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have to reinstall the OS and cassandra. I can think of two ways of reintegrating the host into the cluster 1) shrink the cluster to three nodes and add the node into the cluster 2) Add the node into the cluster without shrinking I'm not sure of the best approach to take and I'm not sure how to achieve each step. Can anybody help? -- Thanks Jabbar Azam -- Thanks Jabbar Azam -- Thanks Jabbar Azam -- Thanks Jabbar Azam -- Thanks Jabbar Azam
Re: Recovering from a faulty cassandra node
Can I do a multiple node nodetool cleanup on my test cluster? On 21 Mar 2013 17:12, Jabbar Azam aja...@gmail.com wrote: All cassandra-topology.properties are the same. The node add appears to be successful. I can see it using nodetool status. I'm doing a node cleanup on the old nodes and then will do a node remove, to remove the old node. The actual node join took about 6 hours. The wiped node(now new node) has about 324 GB of files in /var/lib/cassandra On 21 March 2013 16:58, aaron morton aa...@thelastpickle.com wrote: Not sure if I needed to change cassandra-topology.properties file on the existing nodes. If you are using the PropertyFileSnitch all nodes need to have the same cassandra-topology.properties file. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 1:34 AM, Jabbar Azam aja...@gmail.com wrote: I've added the node with a different IP address and after disabling the firewall data is being streamed from the existing nodes to the wiped node. I'll do a cleanup, followed by remove node once it's done. I've also added the new node to the existing nodes' cassandra-topology.properties file and restarted them. I also found I had iptables switched on and couldn't understand why the wiped node couldn't see the cluster. Not sure if I needed to change cassandra-topology.properties file on the existing nodes. On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote: Do I use removenode before adding the reinstalled node or after? On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote: In 1.2, you may want to use the nodetool removenode if your server i broken or unreachable, else I guess nodetool decommission remains the good way to remove a node. ( http://www.datastax.com/docs/1.2/references/nodetool) When this node is out, rm -rf /yourpath/cassandra/* on this serveur, change the configuration if needed (not sure about the auto_bootstrap param) and start Cassandra on that node again. It should join the ring as a new node. Good luck. 2013/3/19 Hiller, Dean dean.hil...@nrel.gov Since you cleared out that node, it IS the replacement node. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 9:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Recovering from a faulty cassandra node Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto: dean.hil...@nrel.gov wrote: I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so streaming takes affect. I would however test that all out in QA to make sure it works and if you have QUOROM reads/writes a good part of that test would be to take node X down after your node Y is back in the cluster to make sure reads/writes are working on the node you fixed…..you just need to make sure node X shares one of the token ranges of node Y AND your writes/reads are in that token range. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto: aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 8:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Recovering from a faulty cassandra node Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have to reinstall the OS and cassandra. I can think of two ways of reintegrating the host into the cluster 1) shrink the cluster to three nodes and add the node into the cluster 2) Add the node into the cluster without shrinking I'm not sure of the best approach to take and I'm not sure how to achieve each step. Can anybody help? -- Thanks
Re: Recovering from a faulty cassandra node
nodetool cleanup command removes keys which can be deleted from the node the command is run. So I'm assuming I can run nodetool cleanup on all the old nodes in parallel. Wouldn't do this on a live cluster as it's I/O intensive on each node. On 21 March 2013 17:26, Jabbar Azam aja...@gmail.com wrote: Can I do a multiple node nodetool cleanup on my test cluster? On 21 Mar 2013 17:12, Jabbar Azam aja...@gmail.com wrote: All cassandra-topology.properties are the same. The node add appears to be successful. I can see it using nodetool status. I'm doing a node cleanup on the old nodes and then will do a node remove, to remove the old node. The actual node join took about 6 hours. The wiped node(now new node) has about 324 GB of files in /var/lib/cassandra On 21 March 2013 16:58, aaron morton aa...@thelastpickle.com wrote: Not sure if I needed to change cassandra-topology.properties file on the existing nodes. If you are using the PropertyFileSnitch all nodes need to have the same cassandra-topology.properties file. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/03/2013, at 1:34 AM, Jabbar Azam aja...@gmail.com wrote: I've added the node with a different IP address and after disabling the firewall data is being streamed from the existing nodes to the wiped node. I'll do a cleanup, followed by remove node once it's done. I've also added the new node to the existing nodes' cassandra-topology.properties file and restarted them. I also found I had iptables switched on and couldn't understand why the wiped node couldn't see the cluster. Not sure if I needed to change cassandra-topology.properties file on the existing nodes. On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote: Do I use removenode before adding the reinstalled node or after? On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote: In 1.2, you may want to use the nodetool removenode if your server i broken or unreachable, else I guess nodetool decommission remains the good way to remove a node. ( http://www.datastax.com/docs/1.2/references/nodetool) When this node is out, rm -rf /yourpath/cassandra/* on this serveur, change the configuration if needed (not sure about the auto_bootstrap param) and start Cassandra on that node again. It should join the ring as a new node. Good luck. 2013/3/19 Hiller, Dean dean.hil...@nrel.gov Since you cleared out that node, it IS the replacement node. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 9:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Recovering from a faulty cassandra node Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto: dean.hil...@nrel.gov wrote: I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so streaming takes affect. I would however test that all out in QA to make sure it works and if you have QUOROM reads/writes a good part of that test would be to take node X down after your node Y is back in the cluster to make sure reads/writes are working on the node you fixed…..you just need to make sure node X shares one of the token ranges of node Y AND your writes/reads are in that token range. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto: aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 8:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Recovering from a faulty cassandra node Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have to reinstall the OS and cassandra. I can
Re: Recovering from a faulty cassandra node
I've added the node with a different IP address and after disabling the firewall data is being streamed from the existing nodes to the wiped node. I'll do a cleanup, followed by remove node once it's done. I've also added the new node to the existing nodes' cassandra-topology.properties file and restarted them. I also found I had iptables switched on and couldn't understand why the wiped node couldn't see the cluster. Not sure if I needed to change cassandra-topology.properties file on the existing nodes. On 19 March 2013 15:49, Jabbar Azam aja...@gmail.com wrote: Do I use removenode before adding the reinstalled node or after? On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote: In 1.2, you may want to use the nodetool removenode if your server i broken or unreachable, else I guess nodetool decommission remains the good way to remove a node. ( http://www.datastax.com/docs/1.2/references/nodetool) When this node is out, rm -rf /yourpath/cassandra/* on this serveur, change the configuration if needed (not sure about the auto_bootstrap param) and start Cassandra on that node again. It should join the ring as a new node. Good luck. 2013/3/19 Hiller, Dean dean.hil...@nrel.gov Since you cleared out that node, it IS the replacement node. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 9:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Recovering from a faulty cassandra node Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto: dean.hil...@nrel.gov wrote: I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so streaming takes affect. I would however test that all out in QA to make sure it works and if you have QUOROM reads/writes a good part of that test would be to take node X down after your node Y is back in the cluster to make sure reads/writes are working on the node you fixed…..you just need to make sure node X shares one of the token ranges of node Y AND your writes/reads are in that token range. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto: aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 8:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Recovering from a faulty cassandra node Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have to reinstall the OS and cassandra. I can think of two ways of reintegrating the host into the cluster 1) shrink the cluster to three nodes and add the node into the cluster 2) Add the node into the cluster without shrinking I'm not sure of the best approach to take and I'm not sure how to achieve each step. Can anybody help? -- Thanks Jabbar Azam -- Thanks Jabbar Azam -- Thanks Jabbar Azam -- Thanks Jabbar Azam
RE: java.lang.OutOfMemoryError: unable to create new native thread
Hello, Also have a look at http://www.datastax.com/docs/1.2/install/recommended_settings On 21 Mar 2013 00:06, S C as...@outlook.com wrote: Apparently max user process was set very low on the machine. How to check? ulimit -u Set it to unlimited /etc/security/limits.conf * soft nprocs unlimited * hard nprocs unlimited -- From: as...@outlook.com To: user@cassandra.apache.org Subject: RE: java.lang.OutOfMemoryError: unable to create new native thread Date: Fri, 15 Mar 2013 18:57:05 -0500 I think I figured out where the issue is. I will keep you posted soon. -- From: as...@outlook.com To: user@cassandra.apache.org Subject: java.lang.OutOfMemoryError: unable to create new native thread Date: Fri, 15 Mar 2013 17:54:25 -0500 I have a Cassandra node that is going down frequently with 'java.lang.OutOfMemoryError: unable to create new native thread. Its a 16GB VM out of which 4GB is set as Xmx and there are no other process running on the VM. I have about 300 clients connecting to this node on an average. I have no indication from vmstats/SAR that my VM has used more memory or is memory hungry. Doesn't indicate a memory issue to me. Appreciate any pointers to this. System Specs: 2CPU 16GB RHEL 6.2 Thank you.
Recovering from a faulty cassandra node
Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have to reinstall the OS and cassandra. I can think of two ways of reintegrating the host into the cluster 1) shrink the cluster to three nodes and add the node into the cluster 2) Add the node into the cluster without shrinking I'm not sure of the best approach to take and I'm not sure how to achieve each step. Can anybody help? -- Thanks Jabbar Azam
Re: Recovering from a faulty cassandra node
Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.gov wrote: I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so streaming takes affect. I would however test that all out in QA to make sure it works and if you have QUOROM reads/writes a good part of that test would be to take node X down after your node Y is back in the cluster to make sure reads/writes are working on the node you fixed…..you just need to make sure node X shares one of the token ranges of node Y AND your writes/reads are in that token range. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 8:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Recovering from a faulty cassandra node Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have to reinstall the OS and cassandra. I can think of two ways of reintegrating the host into the cluster 1) shrink the cluster to three nodes and add the node into the cluster 2) Add the node into the cluster without shrinking I'm not sure of the best approach to take and I'm not sure how to achieve each step. Can anybody help? -- Thanks Jabbar Azam -- Thanks Jabbar Azam
Re: Recovering from a faulty cassandra node
Yes you're probably right. I don't really understand the token generation so was reluctant to do that. I'll install linux on the faulty node now and let you know what happens. On 19 March 2013 15:38, Hiller, Dean dean.hil...@nrel.gov wrote: Since you cleared out that node, it IS the replacement node. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 9:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Recovering from a faulty cassandra node Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto: dean.hil...@nrel.gov wrote: I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so streaming takes affect. I would however test that all out in QA to make sure it works and if you have QUOROM reads/writes a good part of that test would be to take node X down after your node Y is back in the cluster to make sure reads/writes are working on the node you fixed…..you just need to make sure node X shares one of the token ranges of node Y AND your writes/reads are in that token range. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto: aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 8:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Recovering from a faulty cassandra node Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have to reinstall the OS and cassandra. I can think of two ways of reintegrating the host into the cluster 1) shrink the cluster to three nodes and add the node into the cluster 2) Add the node into the cluster without shrinking I'm not sure of the best approach to take and I'm not sure how to achieve each step. Can anybody help? -- Thanks Jabbar Azam -- Thanks Jabbar Azam -- Thanks Jabbar Azam
Re: Recovering from a faulty cassandra node
Do I use removenode before adding the reinstalled node or after? On 19 March 2013 15:45, Alain RODRIGUEZ arodr...@gmail.com wrote: In 1.2, you may want to use the nodetool removenode if your server i broken or unreachable, else I guess nodetool decommission remains the good way to remove a node. ( http://www.datastax.com/docs/1.2/references/nodetool) When this node is out, rm -rf /yourpath/cassandra/* on this serveur, change the configuration if needed (not sure about the auto_bootstrap param) and start Cassandra on that node again. It should join the ring as a new node. Good luck. 2013/3/19 Hiller, Dean dean.hil...@nrel.gov Since you cleared out that node, it IS the replacement node. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 9:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Recovering from a faulty cassandra node Hello Dean. I'm using vnodes so can't specify a token. In addition I can't follow the replace node docs because I don't have a replacement node. On 19 March 2013 15:25, Hiller, Dean dean.hil...@nrel.govmailto: dean.hil...@nrel.gov wrote: I have not done this as of yet but from all that I have read your best option is to follow the replace node documentation which I belive you need to 1. Have the token be the same BUT add 1 to it so it doesn't think it's the same computer 2. Have the bootstrap option set or something so streaming takes affect. I would however test that all out in QA to make sure it works and if you have QUOROM reads/writes a good part of that test would be to take node X down after your node Y is back in the cluster to make sure reads/writes are working on the node you fixed…..you just need to make sure node X shares one of the token ranges of node Y AND your writes/reads are in that token range. Dean From: Jabbar Azam aja...@gmail.commailto:aja...@gmail.commailto: aja...@gmail.commailto:aja...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 19, 2013 8:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Recovering from a faulty cassandra node Hello, I am using Cassandra 1.2.2 on a 4 node test cluster with vnodes. I waited for over a week to insert lots of data into the cluster. During the end of the process one of the nodes had a hardware fault. I have fixed the hardware fault but the filing system on that node is corrupt so I'll have to reinstall the OS and cassandra. I can think of two ways of reintegrating the host into the cluster 1) shrink the cluster to three nodes and add the node into the cluster 2) Add the node into the cluster without shrinking I'm not sure of the best approach to take and I'm not sure how to achieve each step. Can anybody help? -- Thanks Jabbar Azam -- Thanks Jabbar Azam -- Thanks Jabbar Azam
Re: Backup solution
Hello, If the live data centre disappears restoring the data from the backup is going to take ages especially if the data is going from one data centre to another, unless you have a high bandwidth connection between data centres or you have a small amount of data. Jabbar Azam On 14 Mar 2013 14:31, Rene Kochen rene.koc...@schange.com wrote: Hi all, Is the following a good backup solution. Create two data-centers: - A live data-center with multiple nodes (commodity hardware). Clients connect to this cluster with LOCAL_QUORUM. - A backup data-center with 1 node (with fast SSDs). Clients do not connect to this cluster. Cluster only used for creating and storing snapshots. Advantages: - No snapshots and bulk network I/O (transfer snapshots) needed on the live cluster. - Clients are not slowed down because writes to the backup data-center are async. - On the backup cluster snapshots are made on a regular basis. This again does not affect the live cluster. - The back-up cluster does not need to process client requests/reads, so we need less machines for the backup cluster than the live cluster. Are there any disadvantages with this approach? Thanks!
Re: Compaction statistics information
Thanks Tyler On 3 Mar 2013 18:55, Tyler Hobbs ty...@datastax.com wrote: It's a description of how many of the compacted SSTables the rows were spread across prior to compaction. In your case, 15 rows were spread across two of the four sstables, 68757 rows were spread across three of the four sstables, and 6865 were spread across all four. On Fri, Mar 1, 2013 at 11:07 AM, Jabbar Azam aja...@gmail.com wrote: Hello, I'm seeing compaction statistics which look like the following INFO 17:03:09,216 Compacted 4 sstables to [/var/lib/cassandra/data/studata/datapoints/studata-datapoints-ib-629,]. 420,807,293 bytes to 415,287,150 (~98% of original) in 341,690ms = 1.159088MB/s. 233,761 total rows, 75,637 unique. Row merge counts were {1:0, 2:15, 3:68757, 4:6865, } Does anybody know what Row merge counts were {1:0, 2:15, 3:68757, 4:6865, } means? -- Thanks A Jabbar Azam -- Tyler Hobbs DataStax http://datastax.com/