Re: CQL 2, CQL 3 and Thrift confusion
Yup, that was exactly the cause. Somehow I could not figure out why it was downcasing my keyspace name all the time. May be good to put it somewhere in reference material with a more detailed explanation. On Sun, Sep 23, 2012 at 9:30 PM, Sylvain Lebresne sylv...@datastax.comwrote: In CQL3, names are case insensitive by default, while they were case sensitive in CQL2. You can force whatever case you want in CQL3 however using double quotes. So in other words, in CQL3, USE TestKeyspace; should work as expected. -- Sylvain On Sun, Sep 23, 2012 at 9:22 PM, Oleksandr Petrov oleksandr.pet...@gmail.com wrote: Hi, I'm currently using Cassandra 1.1.5. When I'm trying to create a Keyspace from CQL 2 with a command (`cqlsh -2`): CREATE KEYSPACE TestKeyspace WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor = 1 Then try to access it from CQL 3 (`cqlsh -3`): USE TestKeyspace; I get an error: Bad Request: Keyspace 'testkeyspace' does not exist Same thing is applicable to Thrift Interface. Somehow, I can only access keyspaces created from CQL 2 via Thrift Interface. Basically, I get same exact error: InvalidRequestException(why:There is no ring for the keyspace: CascadingCassandraCql3) Am I missing some switch? Or maybe it is intended to work that way?... Thanks! -- alex p -- alex p
Re: compression
Thanks all, that helps. Will start with one - two CFs and let you know the effect *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Sep 23, 2012 at 8:21 PM, Hiller, Dean dean.hil...@nrel.gov wrote: As well as your unlimited column names may all have the same prefix, right? Like accounts.rowkey56, accounts.rowkey78, etc. etc. so the accounts gets a ton of compression then. Later, Dean From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, September 23, 2012 11:46 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: compression column metadata, you're still likely to get a reasonable amount of compression. This is especially true if there is some amount of repetition in the column names, values, or TTLs in wide rows. Compression will almost always be beneficial unless you're already somehow CPU bound or are using large column values that are high in entropy, such as pre-compressed or encrypted data. tokLogo.png
Re: any ways to have compaction use less disk space?
Why so? What are pluses and minuses? As for me, I am looking for number of files in directory. 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view. 700GB/5MB*5 = 70 files, that is too much for single directory, too much memory used for SST data, too huge compaction queue (that leads to strange pauses, I suppose because of compactor thinking what to compact next),... 2012/9/23 Aaron Turner synfina...@gmail.com On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com wrote: If you think about space, use Leveled compaction! This won't only allow you to fill more space, but also will shrink you data much faster in case of updates. Size compaction can give you 3x-4x more space used than there are live data. Consider the following (our simplified) scenario: 1) The data is updated weekly 2) Each week a large SSTable is written (say, 300GB) after full update processing. 3) In 3 weeks you will have 1.2TB of data in 3 large SSTables. 4) Only after 4th week they all will be compacted into one 300GB SSTable. Leveled compaction've tamed space for us. Note that you should set sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB per node) to prevent creating a lot of small files. 512MB per sstable? Wow, that's freaking huge. From my conversations with various developers 5-10MB seems far more reasonable. I guess it really depends on your usage patterns, but that seems excessive to me- especially as sstables are promoted. -- Best regards, Vitalii Tymchyshyn
DunDDD NoSQL and Big Data
Hi All, I'm organising the NoSQL and Big Data track at Developer Day Dundee: http://dun.dddscotland.co.uk/ This is free mini conference at Dundee University, Dundee Scotland. For the past 2 years we've had a track on NoSQL and had some great speakers. However I don't believe we've had anyone from the Cassandra community join us and give a talk.If your interested, drop me a line and let me know what your proposing. I should point out, as this is a free conference, we can't pay speakers and unless we get a big sponsor, it's doubtful we can manage much in the way of expenses ! Andy Cobley Program director, MSc Business Intelligence and Data Science School of Computing University of Dundee http://www.computing.dundee.ac.uk/ The University of Dundee is a Scottish Registered Charity, No. SC015096.
Cassandra failures while moving token
Hi Actually problem is that while we move the token in a 12 node cluster we observe cassandra misses (no data as per cassandra for requested row key). As per our understanding we expect that when we move token then that node will first sync up the data as per the new assigned token only after that it will receive the requests for new range. So not sure why cluster gives a miss as soon as we move token. Is there any way/utility through which we can tell that a particular row key is fetched from which node so as to ensure that token move is completed fine and data is lying on correct new node and also being looked up by cluster on correct node. OR Please tell that what is the best way out to change the tokens in the cluster. Thanks Regards Shashilpi Krishan CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
workarounds for https://issues.apache.org/jira/browse/CASSANDRA-3741
Are there any tested patches around for fixing this issue in 1.0 branch? I have to do keyspace wide flush every 30 seconds to survive delete-only workload. This is very inefficient. https://issues.apache.org/jira/browse/CASSANDRA-3741
Nodetool repair and Leveled Compaction
Hi Guys We've noticed a strange behavior on our 3-nodes staging Cassandra cluster with RF=2 and LeveledCompactionStrategy. When we run nodetool repair keyspace cfname -pr on a node, the other nodes start validation process and when this process is finished one of the other 2 nodes reports that there are apparently several hundreds of pending compaction tasks and total disk space used by the column family (in JMX) is doubled((( Repair process by itself is going well in a background, but the issue I'm concerned is a lot of unnecessary compaction tasks and doubled disk space on one of good nodes. Is such behavior designed or it is a bug?
Re: Varchar indexed column and IN(...)
On Sun, Sep 23, 2012 at 11:30 PM, aaron morton aa...@thelastpickle.com wrote: If this is intended behavior, could somebody please point me to where this is documented? It is intended. It is not in fact. We should either refuse the query as yet unsupported or we should do the right thing, but returning nothing silently is wrong. I've created https://issues.apache.org/jira/browse/CASSANDRA-4709 to fix that. -- Sylvain
Re: downgrade from 1.1.4 to 1.0.X
On Thu, Sep 20, 2012 at 10:13:49AM +1200, aaron morton wrote: No. They use different minor file versions which are not backwards compatible. Thanks Aaron. Is upgradesstables capable of downgrading the files to 1.0.8? Looking for a way to make this work. Regards, Arend-Jan On 18/09/2012, at 11:18 PM, Arend-Jan Wijtzes ajwyt...@wise-guys.nl wrote: Hi, We are running Cassandra 1.1.4 and like to experiment with Datastax Enterprise which uses 1.0.8. Can we safely downgrade a production cluster or is it incompatible? Any special steps involved? -- Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl
[BETA RELEASE] Apache Cassandra 1.2.0-beta1 released
The Cassandra team is pleased to announce the release of the first beta for the future Apache Cassandra 1.2.0. Let me first stress that this is beta software and as such is *not* ready for production use. The goal of this release is to give a preview of what will become Cassandra 1.2 and to get wider testing before the final release. As such, it is likely not bug free but all help in testing this beta would be greatly appreciated and will help make 1.2 a solid release. So please report any problem you may encounter[3,4] with this release. Have a look at the change log[1] and the release notes[2] to see where Cassandra 1.2 differs from the previous series. Apache Cassandra 1.2.0-beta1[5] is available as usual from the cassandra website (http://cassandra.apache.org/download/) and a debian package is available using the 12x branch (see http://wiki.apache.org/cassandra/DebianPackaging). Thank you for your help in testing and have fun with it. [1]: http://goo.gl/qhh8h (CHANGES.txt) [2]: http://goo.gl/Pu9kh (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA [4]: user@cassandra.apache.org [5]: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-beta1
Re: Correct model
2012/9/23 Hiller, Dean dean.hil...@nrel.gov You need to split data among partitions or your query won't scale as more and more data is added to table. Having the partition means you are querying a lot less rows. This will happen in case I can query just one partition. But if I need to query things in multiple partitions, wouldn't it be slower? He means determine the ONE partition key and query that partition. Ie. If you want just latest user requests, figure out the partition key based on which month you are in and query it. If you want the latest independent of user, query the correct single partition for GlobalRequests CF. But in this case, I didn't understand Aaron's model then. My first query is to get all requests for a user. If I did partitions by time, I will need to query all partitions to get the results, right? In his answer it was said I would query ONE partition... If I want all the requests for the user, couldn't I just select all UserRequest records which start with userId? He designed it so the user requests table was completely scalable so he has partitions there. If you don't have partitions, you could run into a row that is t long. You don't need to design it this way if you know none of your users are going to go into the millions as far as number of requests. In his design then, you need to pick the correct partition and query into that partition. You mean too many rows, not a row too long, right? I am assuming each request will be a different row, not a new column. Is having billions of ROWS something non performatic in Cassandra? I know Cassandra allows up to 2 billion columns for a CF, but I am not aware of a limitation for rows... I really didn't understand why to use partitions. Partitions are a way if you know your rows will go into the trillions of breaking them up so each partition has 100k rows or so or even 1 million but maxes out in the millions most likely. Without partitions, you hit a limit in the millions. With partitions, you can keep scaling past that as you can have as many partitions as you want. If I understood it correctly, if I don't specify partitions, Cassandra will store all my data in a single node? I thought Cassandra would automatically distribute my data among nodes as I insert rows into a CF. Of course if I use partitions I understand I could query just one partition (node) to get the data, if I have the partition field, but to the best of my knowledge, this is not what happens in my case, right? In the first query I would have to query all the partitions... Or you are saying partitions have nothing to do with nodes?? I 99,999% of my users will have less than 100k requests, would it make sense to partition by user? A multi-get is a query that finds IN PARALLEL all the rows with the matching keys you send to cassandra. If you do 1000 gets(instead of a multi-get) with 1ms latency, you will find, it takes 1 second+processing time. If you do ONE multi-get, you only have 1 request and therefore 1ms latency. The other solution is you could send 1000 asycnh gets but I have a feeling that would be slower with all the marshalling/unmarshalling of the envelope…..really depends on the envelope size like if we were using http, you would get killed doing 1000 requests instead of 1 with 1000 keys in it. That's cool! :D So if I need to query data split in 10 partitions, for instance, I can perform the query in parallel by using a multiget, right? Out of curiosity, if each get will occur on a different node, I would need to connect to each of the nodes? Or would I query 1 node and it would communicate to others? Later, Dean From: Marcelo Elias Del Valle mvall...@gmail.commailto: mvall...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, September 23, 2012 10:23 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Correct model 2012/9/20 aaron morton aa...@thelastpickle.commailto: aa...@thelastpickle.com I would consider: # User CF * row_key: user_id * columns: user properties, key=value # UserRequests CF * row_key: user_id : partition_start where partition_start is the start of a time partition that makes sense in your domain. e.g. partition monthly. Generally want to avoid rows the grow forever, as a rule of thumb avoid rows more than a few 10's of MB. * columns: two possible approaches: 1) If the requests are immutable and you generally want all of the data store the request in a single column using JSON or similar, with the column name a timestamp. 2) Otherwise use a composite column name of timestamp : request_property to store the request in many columns. * In either case consider using Reversed comparators so the most recent columns are first see
Re: Nodetool repair and Leveled Compaction
Repair process by itself is going well in a background, but the issue I'm concerned is a lot of unnecessary compaction tasks number in compaction tasks counter is over estimated. For example i have 1100 tasks left and if I will stop inserting data, all tasks will finish within 30 minutes. I suppose that this counter is incremented for every sstable which needs compaction, but its not decremented properly because you can compact about 20 sstables at once, and this reduces counter only by 1.
Re: Correct model
I am confused. In this email you say you want get all requests for a user and in a previous one you said Select all the users which has new requests, since date D so let me answer both… For latter, you make ONE query into the latest partition(ONE partition) of the GlobalRequestsCF which gives you the most recent requests ALONG with the user ids of those requests. If you queried all partitions, you would most likely blow out your JVM memory. For the former, you make ONE query to the UserRequestsCF with userid = your user id to get all the requests for that user You mean too many rows, not a row too long, right? I am assuming each request will be a different row, not a new column. Is having billions of ROWS something non performatic in Cassandra? I know Cassandra allows up to 2 billion columns for a CF, but I am not aware of a limitation for rows… Sorry, I was skipping some context. A lot of the backing indexing sometimes is done as a long row so in playOrm, too many rows in a partition means == too many columns in the indexing row for that partition. I believe the same is true in cassandra for their indexing. If I understood it correctly, if I don't specify partitions, Cassandra will store all my data in a single node? Cassandra spreads all your data out on all nodes with or without partitions. A single partition does have it's data co-located though. I 99,999% of my users will have less than 100k requests, would it make sense to partition by user? If you are at 100k(and the requests are rather small), you could embed all the requests in the user or go with Aaron's below suggestion of a UserRequestsCF. If your requests are rather large, you probably don't want to embed them in the User. Either way, it's one query or one row key lookup. That's cool! :D So if I need to query data split in 10 partitions, for instance, I can perform the query in parallel by using a multiget, right? Multiget ignores partitions…you feed it a LIST of keys and it gets them. It just so happens that partitionId had to be part of your row key. Out of curiosity, if each get will occur on a different node, I would need to connect to each of the nodes? Or would I query 1 node and it would communicate to others? I have used Hector and now use Astyanax, I don't worry much about that layer, but I feed astyanax 3 nodes and I believe it discovers some of the other ones. I believe the latter is true but am not 100% sure as I have not looked at that code. As an analogy on the above, if you happen to have used PlayOrm, you would ONLY need one Requests table and you partition by user AND time(two views into the same data partitioned two different ways) and you can do exactly the same thing as Aaron's example. PlayOrm doesn't embed the partition ids in the key leaving it free to partition twice like in your case….and in a refactor, you have to map/reduce A LOT more rows because of rows having the FK of partitionidsubrowkey whereas if you don't have partition id in the key, you only map/reduce the partitioned table in a redesign/refactor. That said, we will be adding support for CQL partitioning in addition to PlayOrm partitioning even though it can be a little less flexible sometimes. Also, CQL locates all the data on one node for a partition. We have found it can be faster sometimes with the parallelized disks that the partitions are NOT all on one node so PlayOrm partitions are virtual only and do not relate to where the rows are stored. An example on our 6 nodes was a join query on a partition with 1,000,000 rows took 60ms (of course I can't compare to CQL here since it doesn't do joins). It really depends how much data is going to come back in the query though too? There are tradeoff's between disk parallel nodes and having your data all on one node of course. Later, Dean From: Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, September 24, 2012 7:45 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Correct model 2012/9/23 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov You need to split data among partitions or your query won't scale as more and more data is added to table. Having the partition means you are querying a lot less rows. This will happen in case I can query just one partition. But if I need to query things in multiple partitions, wouldn't it be slower? He means determine the ONE partition key and query that partition. Ie. If you want just latest user requests, figure out the partition key based on which month you are in and query it. If you want the latest independent of user, query the correct single partition for GlobalRequests CF. But in this case, I didn't understand Aaron's model
Re: Correct model
2012/9/24 Hiller, Dean dean.hil...@nrel.gov I am confused. In this email you say you want get all requests for a user and in a previous one you said Select all the users which has new requests, since date D so let me answer both… I have both needs. These are the two queries I need to perform on the model. For latter, you make ONE query into the latest partition(ONE partition) of the GlobalRequestsCF which gives you the most recent requests ALONG with the user ids of those requests. If you queried all partitions, you would most likely blow out your JVM memory. For the former, you make ONE query to the UserRequestsCF with userid = your user id to get all the requests for that user Now I think I got the main idea! This answered a lot! Sorry, I was skipping some context. A lot of the backing indexing sometimes is done as a long row so in playOrm, too many rows in a partition means == too many columns in the indexing row for that partition. I believe the same is true in cassandra for their indexing. Oh, ok, you were talking about the wide row pattern, right? But playORM is compatible with Aaron's model, isn't it? Can I map exactly this using playORM? The hardest thing for me to use playORM now is I don't know Cassandra well yet, and I know playORM even less. Can I ask playOrm questions in this list? I will try to create a POC here! Only now I am starting to understand what it does ;-) The examples directory is empty for now, I would like to see how to set up the connection with it. Cassandra spreads all your data out on all nodes with or without partitions. A single partition does have it's data co-located though. Now I see. The main advantage of using partitions is keeping the indexes small enough. It has nothing to do with the nodes. Thanks! If you are at 100k(and the requests are rather small), you could embed all the requests in the user or go with Aaron's below suggestion of a UserRequestsCF. If your requests are rather large, you probably don't want to embed them in the User. Either way, it's one query or one row key lookup. I see it now. Multiget ignores partitions…you feed it a LIST of keys and it gets them. It just so happens that partitionId had to be part of your row key. Do you mean I need to load all the keys in memory to do a multiget? I have used Hector and now use Astyanax, I don't worry much about that layer, but I feed astyanax 3 nodes and I believe it discovers some of the other ones. I believe the latter is true but am not 100% sure as I have not looked at that code. Why did you move? Hector is being considered for being the official client for Cassandra, isn't it? I looked at the Astyanax api and it seemed much more high level though As an analogy on the above, if you happen to have used PlayOrm, you would ONLY need one Requests table and you partition by user AND time(two views into the same data partitioned two different ways) and you can do exactly the same thing as Aaron's example. PlayOrm doesn't embed the partition ids in the key leaving it free to partition twice like in your case….and in a refactor, you have to map/reduce A LOT more rows because of rows having the FK of partitionidsubrowkey whereas if you don't have partition id in the key, you only map/reduce the partitioned table in a redesign/refactor. That said, we will be adding support for CQL partitioning in addition to PlayOrm partitioning even though it can be a little less flexible sometimes. I am not sure I understood this part. If I need to refactor, having the partition id in the key would be a bad thing? What would be the alternative? In my case, as I use userId : partitionId as row key, this might be a problem, right? Also, CQL locates all the data on one node for a partition. We have found it can be faster sometimes with the parallelized disks that the partitions are NOT all on one node so PlayOrm partitions are virtual only and do not relate to where the rows are stored. An example on our 6 nodes was a join query on a partition with 1,000,000 rows took 60ms (of course I can't compare to CQL here since it doesn't do joins). It really depends how much data is going to come back in the query though too? There are tradeoff's between disk parallel nodes and having your data all on one node of course. I guess I am still not ready for this level of info. :D In the playORM readme, we have the following: @NoSqlQuery(name=findWithJoinQuery, query=PARTITIONS t(:partId) SELECT t FROM TABLE as t + INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and t.numShares :shares), What would happen behind the scenes when I execute this query? You can only use joins with partition keys, right? In this case, is partId the row id of TABLE CF? Thanks a lot for the answers -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
RE: Cassandra Counters
Hi folks, I looked at my mail below, and Im rambling a bit, so Ill try to re-state my queries pointwise. a) what are the performance tradeoffs on reads writes between creating a standard column family and manually doing the counts by a lookup on a key, versus using counters. b) whats the current state of counters limitations in the latest version of apache cassandra? c) with there being a possibilty of counter values getting out of sync, would counters not be recommended where strong consistency is desired. The normal benefits of cassandra's tunable consistency would not be applicable, as re-tries may cause overstating. So the normal use case is high performance, and where consistency is not paramount. Regards,roshni From: roshni_rajago...@hotmail.com To: user@cassandra.apache.org Subject: Cassandra Counters Date: Mon, 24 Sep 2012 16:21:55 +0530 Hi , I'm trying to understand if counters are a good fit for my use case.Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now...and still need help! Suppose I have a list of items- to which I can add or delete a set of items at a time, and I want a count of the items, without considering changing the database or additional components like zookeeper,I have 2 options_ the first is a counter col family, and the second is a standard one 1. List_Counter_CF TotalItems ListId 50 2.List_Std_CF TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5 ListId 3 70 -20 3 -6 And in the second I can add a new col with every set of items added or deleted. Over time this row may grow wide.To display the final count, Id need to read the row, slice through all columns and add them. In both cases the writes should be fast, in fact standard col family should be faster as there's no read, before write. And for CL ONE write the latency should be same. For reads, the first option is very good, just read one column for a key For the second, the read involves reading the row, and adding each column value via application code. I dont think there's a way to do math via CQL yet.There should be not hot spotting, if the key is sharded well. I could even maintain the count derived from the List_Std_CF in a separate column family which is a standard col family with the final number, but I could do that as a separate process immediately after the write to List_Std_CF completes, so that its not blocking. I understand cassandra is faster for writes than reads, but how slow would Reading by row key be...? Is there any number around after how many columns the performance starts deteriorating, or how much worse in performance it would be? The advantage I see is that I can use the same consistency rules as for the rest of column families. If quorum for reads writes, then you get strongly consistent values. In case of counters I see that in case of timeout exceptions because the first replica is down or not responding, there's a chance of the values getting messed up, and re-trying can mess it up further. Its not idempotent like a standard col family design can be. If it gets messed up, it would need administrator's help (is there a a document on how we could resolve counter values going wrong?) I believe the rest of the limitations still hold good- has anything changed in recent versions? In my opinion, they are not as major as the consistency question.-removing a counter then modifying value - behaviour is undetermined-special process for counter col family sstable loss( need to remove all files)-no TTL support-no secondary indexes In short, I can recommend counters can be used for analytics or while dealing with data where the exact numbers are not important, orwhen its ok to take some time to fix the mismatch, and the performance requirements are most important.However where the numbers should match , its better to use a std column family and a manual implementation. Please share your thoughts on this. Regards,roshni
unsubcribe
unsubscribe
Re: Correct model
Oh, ok, you were talking about the wide row pattern, right? yes But playORM is compatible with Aaron's model, isn't it? Not yet, PlayOrm supports partitioning one table multiple ways as it indexes the columns(in your case, the userid FK column and the time column) Can I map exactly this using playORM? Not yet, but the plan is to map these typical Cassandra scenarios as well. Can I ask playOrm questions in this list? The best place to ask PlayOrm questions is on stack overflow and tag with PlayOrm though I monitor this list and stack overflow for questions(there are already a few questions on stack overflow). The examples directory is empty for now, I would like to see how to set up the connection with it. Running build or build.bat is always kept working and all 62 tests pass(or we don't merge to master) so to see how to make a connection or run an example 1. Run build.bat or build which generates parsing code 2. Import into eclipse (it already has .classpath and .project for you already there) 3. In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not and run any of the tests in-memory or against localhost(We run the test suite also against a 6 node cluster as well and all passes) 4. FactorySingleton probably has the code you are looking for plus you need a class called nosql.Persistence or it won't scan your jar file.(class file not xml file like JPA) Do you mean I need to load all the keys in memory to do a multi get? No, you batch. I am not sure about CQL, but PlayOrm returns a Cursor not the results so you can loop through every key and behind the scenes it is doing batch requests so you can load up 100 keys and make one multi get request for those 100 keys and then can load up the next 100 keys, etc. etc. etc. I need to look more into the apis and protocol of CQL to see if it allows this style of batching. PlayOrm does support this style of batching today. Aaron would know if CQL does. Why did you move? Hector is being considered for being the official client for Cassandra, isn't it? At the time, I wanted the file streaming feature. Also, Hector seemed a bit cumbersome as well compared to astyanax or at least if you were building a platform and had no use for typing the columns. Just personal preference really here. I am not sure I understood this part. If I need to refactor, having the partition id in the key would be a bad thing? What would be the alternative? In my case, as I use userId : partitionId as row key, this might be a problem, right? PlayOrm indexes the columns you choose(ie. The ones you want to use in the where clause) and partitions by columns you choose not based on the key so in PlayOrm, the key is typically a TimeUUID or something cluster unique…..any tables referencing that TimeUUID never have to change. With Cassandra partitioning, if you repartition that table a different way or go for some kind of major change(usually done with map/reduce), all your foreign keys may have to change….it really depends on the situation though. Maybe you get the design right and never have to change. @NoSqlQuery(name=findWithJoinQuery, query=PARTITIONS t(:partId) SELECT t FROM TABLE as t + INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and t.numShares :shares), What would happen behind the scenes when I execute this query? In this case, t or TABLE is a partitioned table since a partition is defined. And t.activityTypeInfo refers to the ActivityTypeInfo table which is not partitioned(AND ActivityTypeInfo won't scale to billions of rows because there is no partitioning but maybe you don't need it!!!). Behind the scenes when you call getResult, it returns a cursor that has NOT done anything yet. When you start looping through the cursor, behind the scenes it is batching requests asking for next 500 matches(configurable) so you never run out of memory….it is EXACTLY like a database cursor. You can even use the cursor to show a user the first set of results and when user clicks next pick up right where the cursor left off (if you saved it to the HttpSession). You can only use joins with partition keys, right? Nope, joins work on anything. You only need to specify the partitionId when you have a partitioned table in the list of join tables. (That is what the PARTITIONS clause is for, to identify partitionId = what?)…it was put BEFORE the SQL instead of within it…CQL took the opposite approach but PlayOrm can also join different partitions together as well ;) ). In this case, is partId the row id of TABLE CF? Nope, partId is one of the columns. There is a test case on this class in PlayOrm …(notice the annotation NoSqlPartitionByThisField on the column/field in the entity)… https://github.com/deanhiller/playorm/blob/master/input/javasrc/com/alvazan/test/db/PartitionedSingleTrade.java PlayOrm allows partitioned tables AND non-partioned tables(non-partitioned tables won't scale but maybe
RE: Cassandra Counters
IMO You would use Cassandra Counters (or other variation of distributed counting) in case of having determined that a centralized version of counting is not going to work. You'd determine the non_feasibility of centralized counting by figuring the speed at which you need to sustain writes and reads and reconcile that with your hard disk seek times (essentially). Once you have proved that you can't do centralized counting, the second layer of arsenal comes into play; which is distributed counting. In distributed counting , the CAP theorem comes into life. in Cassandra, Availability and Network Partitioning trumps over Consistency. So yes, you sacrifice strong consistency for availability and partion tolerance; for eventual consistency. On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com wrote: Hi folks, I looked at my mail below, and Im rambling a bit, so Ill try to re-state my queries pointwise. a) what are the performance tradeoffs on reads writes between creating a standard column family and manually doing the counts by a lookup on a key, versus using counters. b) whats the current state of counters limitations in the latest version of apache cassandra? c) with there being a possibilty of counter values getting out of sync, would counters not be recommended where strong consistency is desired. The normal benefits of cassandra's tunable consistency would not be applicable, as re-tries may cause overstating. So the normal use case is high performance, and where consistency is not paramount. Regards, roshni -- From: roshni_rajago...@hotmail.com To: user@cassandra.apache.org Subject: Cassandra Counters Date: Mon, 24 Sep 2012 16:21:55 +0530 Hi , I'm trying to understand if counters are a good fit for my use case. Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now... and still need help! Suppose I have a list of items- to which I can add or delete a set of items at a time, and I want a count of the items, without considering changing the database or additional components like zookeeper, I have 2 options_ the first is a counter col family, and the second is a standard one 1. List_Counter_CFTotalItemsListId 502.List_Std_CF TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5 ListId 3 70 -20 3 -6 And in the second I can add a new col with every set of items added or deleted. Over time this row may grow wide. To display the final count, Id need to read the row, slice through all columns and add them. In both cases the writes should be fast, in fact standard col family should be faster as there's no read, before write. And for CL ONE write the latency should be same. For reads, the first option is very good, just read one column for a key For the second, the read involves reading the row, and adding each column value via application code. I dont think there's a way to do math via CQL yet. There should be not hot spotting, if the key is sharded well. I could even maintain the count derived from the List_Std_CF in a separate column family which is a standard col family with the final number, but I could do that as a separate process immediately after the write to List_Std_CF completes, so that its not blocking. I understand cassandra is faster for writes than reads, but how slow would Reading by row key be...? Is there any number around after how many columns the performance starts deteriorating, or how much worse in performance it would be? The advantage I see is that I can use the same consistency rules as for the rest of column families. If quorum for reads writes, then you get strongly consistent values. In case of counters I see that in case of timeout exceptions because the first replica is down or not responding, there's a chance of the values getting messed up, and re-trying can mess it up further. Its not idempotent like a standard col family design can be. If it gets messed up, it would need administrator's help (is there a a document on how we could resolve counter values going wrong?) I believe the rest of the limitations still hold good- has anything changed in recent versions? In my opinion, they are not as major as the consistency question. -removing a counter then modifying value - behaviour is undetermined -special process for counter col family sstable loss( need to remove all files) -no TTL support -no secondary indexes In short, I can recommend counters can be used for analytics or while dealing with data where the exact numbers are not important, or when its ok to take some time to fix the mismatch, and the performance requirements are most important. However where the numbers should match , its better to use a std column family and a manual implementation. Please share your thoughts on this. Regards, roshni
Re: Correct model
Dean, There is one last thing I would like to ask about playOrm by this list, the next questiosn will come by stackOverflow. Just because of the context, I prefer asking this here: When you say playOrm indexes a table (which would be a CF behind the scenes), what do you mean? PlayOrm will automatically create a CF to index my CF? Will it auto-manage it, like Cassandra's secondary indexes? In Cassandra, the application is responsible for maintaining the index, right? I might be wrong, but unless I am using secondary indexes I need to update index values manually, right? I got confused when you said PlayOrm indexes the columns you choose. How do I choose and what exactly it means? Best regards, Marcelo Valle. 2012/9/24 Hiller, Dean dean.hil...@nrel.gov Oh, ok, you were talking about the wide row pattern, right? yes But playORM is compatible with Aaron's model, isn't it? Not yet, PlayOrm supports partitioning one table multiple ways as it indexes the columns(in your case, the userid FK column and the time column) Can I map exactly this using playORM? Not yet, but the plan is to map these typical Cassandra scenarios as well. Can I ask playOrm questions in this list? The best place to ask PlayOrm questions is on stack overflow and tag with PlayOrm though I monitor this list and stack overflow for questions(there are already a few questions on stack overflow). The examples directory is empty for now, I would like to see how to set up the connection with it. Running build or build.bat is always kept working and all 62 tests pass(or we don't merge to master) so to see how to make a connection or run an example 1. Run build.bat or build which generates parsing code 2. Import into eclipse (it already has .classpath and .project for you already there) 3. In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not and run any of the tests in-memory or against localhost(We run the test suite also against a 6 node cluster as well and all passes) 4. FactorySingleton probably has the code you are looking for plus you need a class called nosql.Persistence or it won't scan your jar file.(class file not xml file like JPA) Do you mean I need to load all the keys in memory to do a multi get? No, you batch. I am not sure about CQL, but PlayOrm returns a Cursor not the results so you can loop through every key and behind the scenes it is doing batch requests so you can load up 100 keys and make one multi get request for those 100 keys and then can load up the next 100 keys, etc. etc. etc. I need to look more into the apis and protocol of CQL to see if it allows this style of batching. PlayOrm does support this style of batching today. Aaron would know if CQL does. Why did you move? Hector is being considered for being the official client for Cassandra, isn't it? At the time, I wanted the file streaming feature. Also, Hector seemed a bit cumbersome as well compared to astyanax or at least if you were building a platform and had no use for typing the columns. Just personal preference really here. I am not sure I understood this part. If I need to refactor, having the partition id in the key would be a bad thing? What would be the alternative? In my case, as I use userId : partitionId as row key, this might be a problem, right? PlayOrm indexes the columns you choose(ie. The ones you want to use in the where clause) and partitions by columns you choose not based on the key so in PlayOrm, the key is typically a TimeUUID or something cluster unique…..any tables referencing that TimeUUID never have to change. With Cassandra partitioning, if you repartition that table a different way or go for some kind of major change(usually done with map/reduce), all your foreign keys may have to change….it really depends on the situation though. Maybe you get the design right and never have to change. @NoSqlQuery(name=findWithJoinQuery, query=PARTITIONS t(:partId) SELECT t FROM TABLE as t + INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and t.numShares :shares), What would happen behind the scenes when I execute this query? In this case, t or TABLE is a partitioned table since a partition is defined. And t.activityTypeInfo refers to the ActivityTypeInfo table which is not partitioned(AND ActivityTypeInfo won't scale to billions of rows because there is no partitioning but maybe you don't need it!!!). Behind the scenes when you call getResult, it returns a cursor that has NOT done anything yet. When you start looping through the cursor, behind the scenes it is batching requests asking for next 500 matches(configurable) so you never run out of memory….it is EXACTLY like a database cursor. You can even use the cursor to show a user the first set of results and when user clicks next pick up right where the cursor left off (if you saved it to the HttpSession). You can only use joins with partition keys,
Re: Correct model
PlayOrm will automatically create a CF to index my CF? It creates 3 CF's for all indices, IntegerIndice, DecimalIndice, and StringIndice such that the ad-hoc tool that is in development can display the indices as it knows the prefix of the composite column name is of Integer, Decimal or String and it knows the postfix type as well so it can translate back from bytes to the types and properly display in a GUI (i.e. On top of SELECT, the ad-hoc tool is adding a way to view the induce rows so you can check if they got corrupt or not). Will it auto-manage it, like Cassandra's secondary indexes? YES Further detail… You annotated fields with @NoSqlIndexed and PlayOrm adds/removes from the index as you add/modify/remove the entity…..a modify does a remove old val from index and insert new value into index. An example would be PlayOrm stores all long, int, short, byte in a type that uses the least amount of space so IF you have a long OR BigInteger between –128 to 128 it only ends up storing 1 byte in cassandra(SAVING tons of space!!!). Then if you are indexing a type that is one of those, PlayOrm creates a IntegerIndice table. Right now, another guy is working on playorm-server which is a webgui to allow ad-hoc access to all your data as well so you can ad-hoc queries to see data and instead of showing Hex, it shows the real values by translating the bytes to String for the schema portions that it is aware of that is. Later, Dean From: Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, September 24, 2012 12:09 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Correct model Dean, There is one last thing I would like to ask about playOrm by this list, the next questiosn will come by stackOverflow. Just because of the context, I prefer asking this here: When you say playOrm indexes a table (which would be a CF behind the scenes), what do you mean? PlayOrm will automatically create a CF to index my CF? Will it auto-manage it, like Cassandra's secondary indexes? In Cassandra, the application is responsible for maintaining the index, right? I might be wrong, but unless I am using secondary indexes I need to update index values manually, right? I got confused when you said PlayOrm indexes the columns you choose. How do I choose and what exactly it means? Best regards, Marcelo Valle. 2012/9/24 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov Oh, ok, you were talking about the wide row pattern, right? yes But playORM is compatible with Aaron's model, isn't it? Not yet, PlayOrm supports partitioning one table multiple ways as it indexes the columns(in your case, the userid FK column and the time column) Can I map exactly this using playORM? Not yet, but the plan is to map these typical Cassandra scenarios as well. Can I ask playOrm questions in this list? The best place to ask PlayOrm questions is on stack overflow and tag with PlayOrm though I monitor this list and stack overflow for questions(there are already a few questions on stack overflow). The examples directory is empty for now, I would like to see how to set up the connection with it. Running build or build.bat is always kept working and all 62 tests pass(or we don't merge to master) so to see how to make a connection or run an example 1. Run build.bat or build which generates parsing code 2. Import into eclipse (it already has .classpath and .project for you already there) 3. In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not and run any of the tests in-memory or against localhost(We run the test suite also against a 6 node cluster as well and all passes) 4. FactorySingleton probably has the code you are looking for plus you need a class called nosql.Persistence or it won't scan your jar file.(class file not xml file like JPA) Do you mean I need to load all the keys in memory to do a multi get? No, you batch. I am not sure about CQL, but PlayOrm returns a Cursor not the results so you can loop through every key and behind the scenes it is doing batch requests so you can load up 100 keys and make one multi get request for those 100 keys and then can load up the next 100 keys, etc. etc. etc. I need to look more into the apis and protocol of CQL to see if it allows this style of batching. PlayOrm does support this style of batching today. Aaron would know if CQL does. Why did you move? Hector is being considered for being the official client for Cassandra, isn't it? At the time, I wanted the file streaming feature. Also, Hector seemed a bit cumbersome as well compared to astyanax or at least if you were building a platform and had no use for typing the columns. Just personal preference
Re: Is it possible to create a schema before a Cassandra node starts up ?
On Fri, Sep 14, 2012 at 7:05 AM, Xu, Zaili z...@pershing.com wrote: I am pretty new to Cassandra. I have a script that needs to set up a schema first before starting up the cassandra node. Is this possible ? Can I create the schema directly on cassandra storage and then when the node starts up it will pick up the schema ? Aaron gave you the scientific answer, which is that you can't load schema without starting a node. However if you : 1) start a node for the first time 2) load schema 3) call nodetool drain so all system keyspace CFs are guaranteed to be flushed to sstables 4) then, from your script, start that node (or a node with identical configuration) using the flushed system sstables (directly on the storage) You can set up a schema before starting up the cassandra node or having a cassandra node or cluster running all the time. This might be useful in for example testing contexts... =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Correct model
Dean, this sounds like magic :D I don't know details about the performance on the index implementations you chose, but it would pay the way to use it in my case, as I don't need the best performance in the world when reading, but I need to assure scalability and have a simple model to maintain. I liked the playOrm concept regarding this. I have more doubts, but I will ask them at stack over flow from now on. 2012/9/24 Hiller, Dean dean.hil...@nrel.gov PlayOrm will automatically create a CF to index my CF? It creates 3 CF's for all indices, IntegerIndice, DecimalIndice, and StringIndice such that the ad-hoc tool that is in development can display the indices as it knows the prefix of the composite column name is of Integer, Decimal or String and it knows the postfix type as well so it can translate back from bytes to the types and properly display in a GUI (i.e. On top of SELECT, the ad-hoc tool is adding a way to view the induce rows so you can check if they got corrupt or not). Will it auto-manage it, like Cassandra's secondary indexes? YES Further detail… You annotated fields with @NoSqlIndexed and PlayOrm adds/removes from the index as you add/modify/remove the entity…..a modify does a remove old val from index and insert new value into index. An example would be PlayOrm stores all long, int, short, byte in a type that uses the least amount of space so IF you have a long OR BigInteger between –128 to 128 it only ends up storing 1 byte in cassandra(SAVING tons of space!!!). Then if you are indexing a type that is one of those, PlayOrm creates a IntegerIndice table. Right now, another guy is working on playorm-server which is a webgui to allow ad-hoc access to all your data as well so you can ad-hoc queries to see data and instead of showing Hex, it shows the real values by translating the bytes to String for the schema portions that it is aware of that is. Later, Dean From: Marcelo Elias Del Valle mvall...@gmail.commailto: mvall...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, September 24, 2012 12:09 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Correct model Dean, There is one last thing I would like to ask about playOrm by this list, the next questiosn will come by stackOverflow. Just because of the context, I prefer asking this here: When you say playOrm indexes a table (which would be a CF behind the scenes), what do you mean? PlayOrm will automatically create a CF to index my CF? Will it auto-manage it, like Cassandra's secondary indexes? In Cassandra, the application is responsible for maintaining the index, right? I might be wrong, but unless I am using secondary indexes I need to update index values manually, right? I got confused when you said PlayOrm indexes the columns you choose. How do I choose and what exactly it means? Best regards, Marcelo Valle. 2012/9/24 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov Oh, ok, you were talking about the wide row pattern, right? yes But playORM is compatible with Aaron's model, isn't it? Not yet, PlayOrm supports partitioning one table multiple ways as it indexes the columns(in your case, the userid FK column and the time column) Can I map exactly this using playORM? Not yet, but the plan is to map these typical Cassandra scenarios as well. Can I ask playOrm questions in this list? The best place to ask PlayOrm questions is on stack overflow and tag with PlayOrm though I monitor this list and stack overflow for questions(there are already a few questions on stack overflow). The examples directory is empty for now, I would like to see how to set up the connection with it. Running build or build.bat is always kept working and all 62 tests pass(or we don't merge to master) so to see how to make a connection or run an example 1. Run build.bat or build which generates parsing code 2. Import into eclipse (it already has .classpath and .project for you already there) 3. In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not and run any of the tests in-memory or against localhost(We run the test suite also against a 6 node cluster as well and all passes) 4. FactorySingleton probably has the code you are looking for plus you need a class called nosql.Persistence or it won't scan your jar file.(class file not xml file like JPA) Do you mean I need to load all the keys in memory to do a multi get? No, you batch. I am not sure about CQL, but PlayOrm returns a Cursor not the results so you can loop through every key and behind the scenes it is doing batch requests so you can load up 100 keys and make one multi get request for those 100 keys and then can load up the next 100 keys, etc. etc. etc. I need to look
Prevent queries from OOM nodes
Is there anything I can do on the configuration side to prevent nodes from going OOM due to queries that will read large amounts of data and exceed the heap available? For the past few days of we had some nodes consistently freezing/crashing with OOM. We got a heap dump into MAT and figured out the nodes were dying due to some queries for a few extremely large data sets. Tracked it back to an app that just didn't prevent users from doing these large queries, but it seems like Cassandra could be smart enough to guard against this type of thing? Basically some kind of setting like if the data to satisfy query available heap then throw an error to the caller and about query. I would much rather return errors to clients then crash a node, as the error is easier to track down that way and resolve. Thanks.
Cassandra compression not working?
Hello, We are running into an unusual situation that I'm wondering if anyone has any insight on. We've been running a Cassandra cluster for some time, with compression enabled on one column family in which text documents are stored. We enabled compression on the column family, utilizing the SnappyCompressor and a 64k chunk length. It was recently discovered that Cassandra was reporting a compression ratio of 0. I took a snapshot of the data and started a cassandra node in isolation to investigate. Running nodetool scrub, or nodetool upgradesstables had little impact on the amount of data that was being stored. I then disabled compression and ran nodetool upgradesstables on the column family. Again, not impact on the data size stored. I then reenabled compression and ran nodetool upgradesstables on the column family. This resulting in a 60% reduction in the data size stored, and Cassandra reporting a compression ration of about .38. Any idea what is going on here? Obviously I can go through this process in production to enable compression, however, any idea what is currently happening and why new data does not appear to be compressed? Any insights are appreciated, Thanks, -Mike
Re: Cassandra compression not working?
I forgot to mention we are running Cassandra 1.1.2. Thanks, -Mike On Sep 24, 2012, at 5:00 PM, Michael Theroux mthero...@yahoo.com wrote: Hello, We are running into an unusual situation that I'm wondering if anyone has any insight on. We've been running a Cassandra cluster for some time, with compression enabled on one column family in which text documents are stored. We enabled compression on the column family, utilizing the SnappyCompressor and a 64k chunk length. It was recently discovered that Cassandra was reporting a compression ratio of 0. I took a snapshot of the data and started a cassandra node in isolation to investigate. Running nodetool scrub, or nodetool upgradesstables had little impact on the amount of data that was being stored. I then disabled compression and ran nodetool upgradesstables on the column family. Again, not impact on the data size stored. I then reenabled compression and ran nodetool upgradesstables on the column family. This resulting in a 60% reduction in the data size stored, and Cassandra reporting a compression ration of about .38. Any idea what is going on here? Obviously I can go through this process in production to enable compression, however, any idea what is currently happening and why new data does not appear to be compressed? Any insights are appreciated, Thanks, -Mike
performance for different kinds of row keys
Suppose two cases: 1. I have a Cassandra column family with non-composite row keys = incremental id 2. I have a Cassandra column family with a composite row keys = incremental id 1 : group id Which one will be faster to insert? And which one will be faster to read by incremental id? Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: Code example for CompositeType.Builder and SSTableSimpleUnsortedWriter
Hey... From my understanding, there are several ways to use composites with SSTableSimpleUnsortedWriter but which is the best? And as usual, code examples are welcome ;) Thanks in advance! On Thu, Sep 20, 2012 at 11:23 PM, Edward Kibardin infa...@gmail.com wrote: Hi Everyone, I'm writing a conversion tool from CSV files to SSTable using SSTableSimpleUnsortedWriter and unable to find a good example of using CompositeType.Builder with SSTableSimpleUnsortedWriter. It also will be great if someone had an sample code for insert/update only a single value in composites (if it possible in general). Quick Google search didn't help me, so I'll be very appreciated for the correct sample ;) Thanks in advance, Ed
Re: any ways to have compaction use less disk space?
On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com wrote: Why so? What are pluses and minuses? As for me, I am looking for number of files in directory. 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view. 700GB/5MB*5 = 70 files, that is too much for single directory, too much memory used for SST data, too huge compaction queue (that leads to strange pauses, I suppose because of compactor thinking what to compact next),... Not sure why a lot of files is a problem... modern filesystems deal with that pretty well. Really large sstables mean that compactions now are taking a lot more disk IO and time to complete. Remember, Leveled Compaction is more disk IO intensive, so using large sstables makes that even worse. This is a big reason why the default is 5MB. Also, each level is 10x the size as the previous level. Also, for level compaction, you need 10x the sstable size worth of free space to do compactions. So now you need 5GB of free disk, vs 50MB of free disk. Also, if you're doing deletes in those CF's, that old, deleted data is going to stick around a LOT longer with 512MB files, because it can't get deleted until you have 10x512MB files to compact to level 2. Heaven forbid it doesn't get deleted then because each level is 10x bigger so you end up waiting a LOT longer to actually delete that data from disk. Now, if you're using SSD's then larger sstables is probably doable, but even then I'd guesstimate 50MB is far more reasonable then 512MB. -Aaron 2012/9/23 Aaron Turner synfina...@gmail.com On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com wrote: If you think about space, use Leveled compaction! This won't only allow you to fill more space, but also will shrink you data much faster in case of updates. Size compaction can give you 3x-4x more space used than there are live data. Consider the following (our simplified) scenario: 1) The data is updated weekly 2) Each week a large SSTable is written (say, 300GB) after full update processing. 3) In 3 weeks you will have 1.2TB of data in 3 large SSTables. 4) Only after 4th week they all will be compacted into one 300GB SSTable. Leveled compaction've tamed space for us. Note that you should set sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB per node) to prevent creating a lot of small files. 512MB per sstable? Wow, that's freaking huge. From my conversations with various developers 5-10MB seems far more reasonable. I guess it really depends on your usage patterns, but that seems excessive to me- especially as sstables are promoted. -- Best regards, Vitalii Tymchyshyn -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: JVM 7, Cass 1.1.1 and G1 garbage collector
Haha Ok. It is not a total waste, but practically your time is better spent in other places. The problem is just about everything is a moving target, schema, request rate, hardware. Generally tuning nudges a couple variables in one direction or the other and you see some decent returns. But each nudge takes a restart and a warm up period, and with how Cassandra distributes requests you likely have to flip several nodes or all of them before you can see the change! By the time you do that its probably a different day or week. Essentially finding our if one setting is better then the other is like a 3 day test in production. Before c* I used to deal with this in tomcat. Once in a while we would get a dev that read some article about tuning, something about a new jvm, or collector. With bright eyed enthusiasm they would want to try tuning our current cluster. They spend a couple days and measure something and say it was good lower memory usage. Meanwhile someone else would come to me and say higher 95th response time. More short pauses, fewer long pauses, great taste, less filing. Most people just want to roflscale their huroku cloud. Tuning stuff is sysadmin work and the cloud has taught us that the cost of sysadmins are needless waste of money. Just kidding ! But I do believe the default cassandra settings are reasonable and typically I find that most who look at tuning GC usually need more hardware and actually need to be tuning something somewhere else. G1 is the perfect example of a time suck. Claims low pause latency for big heaps, and delivers something regarded by the Cassandra community (and hbase as well) that works worse then CMS. If you spent 3 hours switching tuning knobs and analysing, that is 3 hours of your life you will never get back. Better to let SUN and other people worry about tuning (at least from where I sit) On Saturday, September 15, 2012, Peter Schuller peter.schul...@infidyne.com wrote: Generally tuning the garbage collector is a waste of time. Sorry, that's BS. It can be absolutely critical, when done right, and only useless when done wrong. There's a spectrum in between. Just follow someone else's recommendation and use that. No, don't. Most recommendations out there are completely useless in the general case because someone did some very specific benchmark under very specific circumstances and then recommends some particular combination of options. In order to understand whether a particular recommendation applies to you, you need to know enough about your use-case that I suspect you're better of just reading up on the available options and figuring things out. Of course, randomly trying various different settings to see which seems to work well may be realistic - but you loose predictability (in the face of changing patterns of traffic for example) if you don't know why it's behaving like it is. If you care about GC related behavior you want to understand how the application behaves, how the garbage collector behaves, what your requirements are, and select settings based on those requirements and how the application and GC behavior combine to produce emergent behavior. The best GC options may vary *wildly* depending on the nature of your cluster and your goals. There are also non-GC settings (in the specific case of Cassandra) that affect the interaction with the garbage collector, like whether you're using row/key caching, or things like phi conviction threshold and/or timeouts. It's very hard for anyone to give generalized recommendations. If it weren't, Cassandra would ship with The One True set of settings that are always the best and there would be no discussion. It's very unfortunate that the state of GC in the freely available JVM:s is at this point given that there exists known and working algorithms (and at least one practical implementation) that avoids it, mostly. But, it's the situation we're in. The only way around it that I know of if you're on Hotspot, is to have the application behave in such a way that it avoids the causes of un-predictable behavior w.r.t. GC by being careful about it's memory allocation and *retention* profile. For the specific case of avoiding *ever* seeing a full gc, it gets even more complex. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
unsubscribe
Re: Secondary index loss on node restart
Can you contribute your experience to this ticket https://issues.apache.org/jira/browse/CASSANDRA-4670 ? Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/09/2012, at 6:22 AM, Michael Theroux mthero...@yahoo.com wrote: Hello, We have been noticing an issue where, about 50% of the time in which a node fails or is restarted, secondary indexes appear to be partially lost or corrupted. A drop and re-add of the index appears to correct the issue. There are no errors in the cassandra logs that I see. Part of the index seems to be simply missing. Sometimes this corruption/loss doesn't happen immediately, but sometime after the node is restarted. In addition, the index never appears to have an issue when the node comes down, it is only after the node comes back up and recovers in which we experience an issue. We developed some code that goes through all the rows in the table, by key, in which the index is present. It then attempts to look up the information via secondary index, in an attempt to detect when the issue occurs. Another odd observation is that the number of members present in the index when we have the issue varies up and down (the index and the tables don't change that often). We are running a 6 node Cassandra cluster with a replication factor of 3, consistency level for all queries is LOCAL_QUORUM. We are running Cassandra 1.1.2. Anyone have any insights? -Mike
Re: any ways to have compaction use less disk space?
If you are using ext3 there is a hard limit on number if files in a directory of 32K. EXT4 as a much higher limit (cant remember exactly IIRC). So true that having many files is not a problem for the file system though your VFS cache could be less efficient since you would have a higher inode-data ratio. Edward On Mon, Sep 24, 2012 at 7:03 PM, Aaron Turner synfina...@gmail.com wrote: On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин tiv...@gmail.com wrote: Why so? What are pluses and minuses? As for me, I am looking for number of files in directory. 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view. 700GB/5MB*5 = 70 files, that is too much for single directory, too much memory used for SST data, too huge compaction queue (that leads to strange pauses, I suppose because of compactor thinking what to compact next),... Not sure why a lot of files is a problem... modern filesystems deal with that pretty well. Really large sstables mean that compactions now are taking a lot more disk IO and time to complete. Remember, Leveled Compaction is more disk IO intensive, so using large sstables makes that even worse. This is a big reason why the default is 5MB. Also, each level is 10x the size as the previous level. Also, for level compaction, you need 10x the sstable size worth of free space to do compactions. So now you need 5GB of free disk, vs 50MB of free disk. Also, if you're doing deletes in those CF's, that old, deleted data is going to stick around a LOT longer with 512MB files, because it can't get deleted until you have 10x512MB files to compact to level 2. Heaven forbid it doesn't get deleted then because each level is 10x bigger so you end up waiting a LOT longer to actually delete that data from disk. Now, if you're using SSD's then larger sstables is probably doable, but even then I'd guesstimate 50MB is far more reasonable then 512MB. -Aaron 2012/9/23 Aaron Turner synfina...@gmail.com On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com wrote: If you think about space, use Leveled compaction! This won't only allow you to fill more space, but also will shrink you data much faster in case of updates. Size compaction can give you 3x-4x more space used than there are live data. Consider the following (our simplified) scenario: 1) The data is updated weekly 2) Each week a large SSTable is written (say, 300GB) after full update processing. 3) In 3 weeks you will have 1.2TB of data in 3 large SSTables. 4) Only after 4th week they all will be compacted into one 300GB SSTable. Leveled compaction've tamed space for us. Note that you should set sstable_size_in_mb to reasonably high value (it is 512 for us with ~700GB per node) to prevent creating a lot of small files. 512MB per sstable? Wow, that's freaking huge. From my conversations with various developers 5-10MB seems far more reasonable. I guess it really depends on your usage patterns, but that seems excessive to me- especially as sstables are promoted. -- Best regards, Vitalii Tymchyshyn -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: [problem with OOM in nodes]
What exactly is the problem with big rows? During compaction the row will be passed through a slower two pass processing, this add's to IO pressure. Counting big rows requires that the entire row be read. Repairing big rows requires that the entire row be repaired. I generally avoid rows above a few 10's of MB as they result in more memory churn and create admin problems as above. What exactly is the problem with big rows? And, how can we should place our data in this case (see the schema in the previous replies)? Splitting one report to multiple rows is uncomfortably :-( Looking at your row sizes below, the question is How do I store an object which may be up to 3.5GB in size. AFAIK there are no hard limits that would prevent you putting that in one row. And avoiding super columns may save some space. You could have a Simple CF, where the each report is one row, each report row is one column and the report row is serialised (with JSON or protobufs etc) and stored in the column value. But i would recommend creating a model where row size is constrained in space. E.g. Report CF: * one report per row. * one column per report row * column value is empty. Report Rows CF: * one row per 100 report rows, e.g. report_id : first_row_number * column name is report row number. * column value is report data (Or use composite column names, e.g. row_number : report_column You can still do ranges, buy you have to do some client side work to work it out. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/09/2012, at 5:14 PM, Denis Gabaydulin gaba...@gmail.com wrote: On Sun, Sep 23, 2012 at 10:41 PM, aaron morton aa...@thelastpickle.com wrote: /var/log/cassandra$ cat system.log | grep Compacting large | grep -E [0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ; print foo MB }' | sort -nr | head -n 50 Is it bad signal? Sorry, I do not know what this is outputting. This is outputting size of big rows which cassandra had compacted before. As I can see in cfstats, compacted row maximum size: 386857368 ! Yes. Having rows in the 100's of MB is will cause problems. Doubly so if they are large super columns. What exactly is the problem with big rows? And, how can we should place our data in this case (see the schema in the previous replies)? Splitting one report to multiple rows is uncomfortably :-( Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/09/2012, at 5:07 AM, Denis Gabaydulin gaba...@gmail.com wrote: And some stuff from log: /var/log/cassandra$ cat system.log | grep Compacting large | grep -E [0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ; print foo MB }' | sort -nr | head -n 50 3821.55MB 3337.85MB 1221.64MB 1128.67MB 930.666MB 916.4MB 861.114MB 843.325MB 711.813MB 706.992MB 674.282MB 673.861MB 658.305MB 557.756MB 531.577MB 493.112MB 492.513MB 492.291MB 484.484MB 479.908MB 465.742MB 464.015MB 459.95MB 454.472MB 441.248MB 428.763MB 424.028MB 416.663MB 416.191MB 409.341MB 406.895MB 397.314MB 388.27MB 376.714MB 371.298MB 368.819MB 366.92MB 361.371MB 360.509MB 356.168MB 355.012MB 354.897MB 354.759MB 347.986MB 344.109MB 335.546MB 329.529MB 326.857MB 326.252MB 326.237MB Is it bad signal? On Fri, Sep 21, 2012 at 8:22 PM, Denis Gabaydulin gaba...@gmail.com wrote: Found one more intersting fact. As I can see in cfstats, compacted row maximum size: 386857368 ! On Fri, Sep 21, 2012 at 12:50 PM, Denis Gabaydulin gaba...@gmail.com wrote: Reports - is a SuperColumnFamily Each report has unique identifier (report_id). This is a key of SuperColumnFamily. And a report saved in separate row. A report is consisted of report rows (may vary between 1 and 50, but most are small). Each report row is saved in separate super column. Hector based code: superCfMutator.addInsertion( report_id, Reports, HFactory.createSuperColumn( report_row_id, mapper.convertObject(object), columnDefinition.getTopSerializer(), columnDefinition.getSubSerializer(), inferringSerializer ) ); We have two frequent operation: 1. count report rows by report_id (calculate number of super columns in the row). 2. get report rows by report_id and range predicate (get super columns from the row with range predicate). I can't see here a big super columns :-( On Fri, Sep 21, 2012 at 3:10 AM, Tyler Hobbs ty...@datastax.com wrote: I'm not 100% that I understand your data model and read patterns correctly, but it sounds like you have large supercolumns and are requesting some of the subcolumns from individual super columns. If that's the case, the issue is that Cassandra must deserialize the entire supercolumn in memory whenever you read *any* of the subcolumns. This is one of the reasons why
Re: Cassandra compression not working?
You are going to need a fully optimized flux-capacitor for that. On Tue, Sep 25, 2012 at 5:00 AM, Michael Theroux mthero...@yahoo.comwrote: Hello, We are running into an unusual situation that I'm wondering if anyone has any insight on. We've been running a Cassandra cluster for some time, with compression enabled on one column family in which text documents are stored. We enabled compression on the column family, utilizing the SnappyCompressor and a 64k chunk length. It was recently discovered that Cassandra was reporting a compression ratio of 0. I took a snapshot of the data and started a cassandra node in isolation to investigate. Running nodetool scrub, or nodetool upgradesstables had little impact on the amount of data that was being stored. I then disabled compression and ran nodetool upgradesstables on the column family. Again, not impact on the data size stored. I then reenabled compression and ran nodetool upgradesstables on the column family. This resulting in a 60% reduction in the data size stored, and Cassandra reporting a compression ration of about .38. Any idea what is going on here? Obviously I can go through this process in production to enable compression, however, any idea what is currently happening and why new data does not appear to be compressed? Any insights are appreciated, Thanks, -Mike
Re: JVM 7, Cass 1.1.1 and G1 garbage collector
It is not a total waste, but practically your time is better spent in other places. The problem is just about everything is a moving target, schema, request rate, hardware. Generally tuning nudges a couple variables in one direction or the other and you see some decent returns. But each nudge takes a restart and a warm up period, and with how Cassandra distributes requests you likely have to flip several nodes or all of them before you can see the change! By the time you do that its probably a different day or week. Essentially finding our if one setting is better then the other is like a 3 day test in production. Before c* I used to deal with this in tomcat. Once in a while we would get a dev that read some article about tuning, something about a new jvm, or collector. With bright eyed enthusiasm they would want to try tuning our current cluster. They spend a couple days and measure something and say it was good lower memory usage. Meanwhile someone else would come to me and say higher 95th response time. More short pauses, fewer long pauses, great taste, less filing. That's why blind blackbox testing isn't the way to go. Understanding what the application does, what the GC does, and the goals you have in mind is more fruitful. For example, are you trying to improve p99? Maybe you want to improve p999 at the cost of worse p99? What about failure modes (non-happy cases)? Perhaps you don't care about few-hundred-ms pauses but want to avoid full gc:s? There's lots of different goals one might have, and workloads. Testing is key, but only in combination with some directed choice of what to tweak. Especially since it's hard to test for for the non-happy cases (e.g., node takes a burst of traffic and starts promoting everything into old-gen prior to processing a request, resulting in a death spiral). G1 is the perfect example of a time suck. Claims low pause latency for big heaps, and delivers something regarded by the Cassandra community (and hbase as well) that works worse then CMS. If you spent 3 hours switching tuning knobs and analysing, that is 3 hours of your life you will never get back. This is similar to saying that someone told you to switch to CMS (or, use some particular flag, etc), you tried it, and it didn't have the result you expected. G1 and CMS have different trade-offs. Nether one will consistently result in better latencies across the board. It's all about the details. Better to let SUN and other people worry about tuning (at least from where I sit) They're not tuning. They are providing very general purpose default behavior, including things that make *no* sense at all with Cassandra. For example, the default behavior with CMS is to try to make the marking phase run as late as possible so that it finishes just prior to heap exhaustion, in order to optimize for throughput; except that's not a good idea for many cases because is exacerbates fragmentation problems in old-gen by pushing usage very high repeatedly, and it increases the chance of full gc because marking started too late (even if you don't hit promotion failures due to fragmentation). Sudden changes in workloads (e.g., compaction kicks in) also makes it harder for CMS's mark triggering heuristics to work well. As such, default options for Cassandra are use certain settings that diverge from that of the default behavior of the JVM, because Cassandra-in-general is much more specific a use-case than the completely general target audience of the JVM. Similarly, a particular cluster (with certain workloads/goals/etc) is a yet more specific use-case than Cassandra-in-general and may be better served by settings that differ from that of default Cassandra. But, I certainly agree with this (which I think roughly matches what you're saying): Don't randomly pick options someone claims is good in a blog post and expect it to just make things better. If it were that easy, it would be the default behavior for obvious reasons. The reason it's not, is likely that it depends on the situation. Further, even if you do play the lottery and win - if you don't know *why*, how are you able to extrapolate the behavior of the system with slightly changed workloads? It's very hard to blackbox-test GC settings, which is probably why GC tuning can be perceived as a useless game of whack-a-mole. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re:
Hi Manu, Glad that you have the issue resolved. If i understand the issue correctly Your cassandra installation had RandomParitioner but the bulk loader configuration (cassandra.yaml) had Murmur3Partitioner? By fixing the cassandra.yaml for the bulk loader the issue got resolved? If not then we might have a bug and your feedback might help the community. Regards, /VJ On Wed, Sep 19, 2012 at 10:41 PM, Manu Zhang owenzhang1...@gmail.comwrote: the problem seems to have gone away with changing Murmur3Partitioner back to RandomPartitioner On Thu, Sep 20, 2012 at 11:14 AM, Manu Zhang owenzhang1...@gmail.comwrote: Yeah, BulkLoader. You did help me to elaborate my question. Thanks! On Thu, Sep 20, 2012 at 10:58 AM, Michael Kjellman mkjell...@barracuda.com wrote: I assumed you were talking about BulkLoader. I haven't played with trunk yet so I'm afraid I won't be much help here... On Sep 19, 2012, at 7:56 PM, Manu Zhang owenzhang1...@gmail.com mailto:owenzhang1...@gmail.com wrote: cassandra-trunk (so it's 1.2); no Hadoop, bulk load example here http://www.datastax.com/dev/blog/bulk-loading#comment-127019; buffer size is 64 MB as in the example; I'm dealing with about 1GB data. job config, you mean? On Thu, Sep 20, 2012 at 10:32 AM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: A few questions: what version of 1.1 are you running. What version of Hadoop? What is your job config? What is the buffer size you've chosen? How much data are you dealing with? On Sep 19, 2012, at 7:23 PM, Manu Zhang owenzhang1...@gmail.com mailto:owenzhang1...@gmail.com wrote: I've been bulk loading data into Cassandra and seen the following exception: ERROR 10:10:31,032 Exception in thread Thread[CompactionExecutor:5,1,main] java.lang.RuntimeException: Last written key DecoratedKey(-442063125946754, 313130303136373a31) = current key DecoratedKey(-465541023623745, 313036393331333a33) writing into /home/manuzhang/cassandra/data/tpch/lineitem/tpch-lineitem-tmp-ia-56-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:131) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:152) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:169) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:69) at org.apache.cassandra.db.compaction.CompactionManager$1.run(CompactionManager.java:152) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) The running Cassandra and that I load data into are the same one. What's the cause? 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
RE: Cassandra Counters
Thanks Milind, Has anyone implemented counting in a standard col family in cassandra, when you can have increments and decrements to the count. Any comparisons in performance to using counter column families? Regards,Roshni Date: Mon, 24 Sep 2012 11:02:51 -0700 Subject: RE: Cassandra Counters From: milindpar...@gmail.com To: user@cassandra.apache.org IMO You would use Cassandra Counters (or other variation of distributed counting) in case of having determined that a centralized version of counting is not going to work. You'd determine the non_feasibility of centralized counting by figuring the speed at which you need to sustain writes and reads and reconcile that with your hard disk seek times (essentially). Once you have proved that you can't do centralized counting, the second layer of arsenal comes into play; which is distributed counting. In distributed counting , the CAP theorem comes into life. in Cassandra, Availability and Network Partitioning trumps over Consistency. So yes, you sacrifice strong consistency for availability and partion tolerance; for eventual consistency. On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com wrote: Hi folks, I looked at my mail below, and Im rambling a bit, so Ill try to re-state my queries pointwise. a) what are the performance tradeoffs on reads writes between creating a standard column family and manually doing the counts by a lookup on a key, versus using counters. b) whats the current state of counters limitations in the latest version of apache cassandra? c) with there being a possibilty of counter values getting out of sync, would counters not be recommended where strong consistency is desired. The normal benefits of cassandra's tunable consistency would not be applicable, as re-tries may cause overstating. So the normal use case is high performance, and where consistency is not paramount. Regards,roshni From: roshni_rajago...@hotmail.com To: user@cassandra.apache.org Subject: Cassandra Counters Date: Mon, 24 Sep 2012 16:21:55 +0530 Hi , I'm trying to understand if counters are a good fit for my use case.Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now... and still need help! Suppose I have a list of items- to which I can add or delete a set of items at a time, and I want a count of the items, without considering changing the database or additional components like zookeeper, I have 2 options_ the first is a counter col family, and the second is a standard one 1. List_Counter_CF TotalItems ListId 50 2.List_Std_CF TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5 ListId 3 70 -20 3 -6 And in the second I can add a new col with every set of items added or deleted. Over time this row may grow wide.To display the final count, Id need to read the row, slice through all columns and add them. In both cases the writes should be fast, in fact standard col family should be faster as there's no read, before write. And for CL ONE write the latency should be same. For reads, the first option is very good, just read one column for a key For the second, the read involves reading the row, and adding each column value via application code. I dont think there's a way to do math via CQL yet.There should be not hot spotting, if the key is sharded well. I could even maintain the count derived from the List_Std_CF in a separate column family which is a standard col family with the final number, but I could do that as a separate process immediately after the write to List_Std_CF completes, so that its not blocking. I understand cassandra is faster for writes than reads, but how slow would Reading by row key be...? Is there any number around after how many columns the performance starts deteriorating, or how much worse in performance it would be? The advantage I see is that I can use the same consistency rules as for the rest of column families. If quorum for reads writes, then you get strongly consistent values. In case of counters I see that in case of timeout exceptions because the first replica is down or not responding, there's a chance of the values getting messed up, and re-trying can mess it up further. Its not idempotent like a standard col family design can be. If it gets messed up, it would need administrator's help (is there a a document on how we could resolve counter values going wrong?) I believe the rest of the limitations still hold good- has anything changed in recent versions? In my opinion, they are not as major as the consistency question. -removing a counter then modifying value - behaviour is undetermined-special process for counter col family sstable loss( need to remove all files)-no TTL support-no secondary indexes In short, I can recommend counters can be used for
Re:
I had Murmur3Partitioner for both of them, otherwise bulk loader would have complained since I put them under the same project. I saw some negative token issues of Murmur3Partitioner on JIRA recently so I moved back to RandomPartitioner. Thanks for your concern On Tue, Sep 25, 2012 at 12:49 PM, Vijay vijay2...@gmail.com wrote: Hi Manu, Glad that you have the issue resolved. If i understand the issue correctly Your cassandra installation had RandomParitioner but the bulk loader configuration (cassandra.yaml) had Murmur3Partitioner? By fixing the cassandra.yaml for the bulk loader the issue got resolved? If not then we might have a bug and your feedback might help the community. Regards, /VJ On Wed, Sep 19, 2012 at 10:41 PM, Manu Zhang owenzhang1...@gmail.comwrote: the problem seems to have gone away with changing Murmur3Partitioner back to RandomPartitioner On Thu, Sep 20, 2012 at 11:14 AM, Manu Zhang owenzhang1...@gmail.comwrote: Yeah, BulkLoader. You did help me to elaborate my question. Thanks! On Thu, Sep 20, 2012 at 10:58 AM, Michael Kjellman mkjell...@barracuda.com wrote: I assumed you were talking about BulkLoader. I haven't played with trunk yet so I'm afraid I won't be much help here... On Sep 19, 2012, at 7:56 PM, Manu Zhang owenzhang1...@gmail.com mailto:owenzhang1...@gmail.com wrote: cassandra-trunk (so it's 1.2); no Hadoop, bulk load example here http://www.datastax.com/dev/blog/bulk-loading#comment-127019; buffer size is 64 MB as in the example; I'm dealing with about 1GB data. job config, you mean? On Thu, Sep 20, 2012 at 10:32 AM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: A few questions: what version of 1.1 are you running. What version of Hadoop? What is your job config? What is the buffer size you've chosen? How much data are you dealing with? On Sep 19, 2012, at 7:23 PM, Manu Zhang owenzhang1...@gmail.com mailto:owenzhang1...@gmail.com wrote: I've been bulk loading data into Cassandra and seen the following exception: ERROR 10:10:31,032 Exception in thread Thread[CompactionExecutor:5,1,main] java.lang.RuntimeException: Last written key DecoratedKey(-442063125946754, 313130303136373a31) = current key DecoratedKey(-465541023623745, 313036393331333a33) writing into /home/manuzhang/cassandra/data/tpch/lineitem/tpch-lineitem-tmp-ia-56-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:131) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:152) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:169) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:69) at org.apache.cassandra.db.compaction.CompactionManager$1.run(CompactionManager.java:152) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) The running Cassandra and that I load data into are the same one. What's the cause? 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: Cassandra Counters
Maybe I'm missing the point, but counting in a standard column family would be a little overkill. I assume that distributed counting here was more of a map/reduce approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a lot. We're doing some more complex counting (e.q. based on sets of rules) like that. Of course, that would perform _way_ slower than counting beforehand. On the other side, you will always have a consistent result for a consistent dataset. On the other hand, if you use things like AMQP or Storm (sorry to put up my sentence together like that, as tools are mostly either orthogonal or complementary, but I hope you get my point), you could build a topology that makes fault-tolerant writes independently of your original write. Of course, it would still have a consistency tradeoff, mostly because of race conditions and different network latencies etc. So I would say that building a data model in a distributed system often depends more on your problem than on the common patterns, because everything has a tradeoff. Want to have an immediate result? Modify your counter while writing the row. Can sacrifice speed, but have more counting opportunities? Go with offline distributed counting. Want to have kind of both, dispatch a message and react upon it, having the processing logic and writes decoupled from main application, allowing you to care less about speed. However, I may have missed the point somewhere (early morning, you know), so I may be wrong in any given statement. Cheers On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal roshni_rajago...@hotmail.com wrote: Thanks Milind, Has anyone implemented counting in a standard col family in cassandra, when you can have increments and decrements to the count. Any comparisons in performance to using counter column families? Regards, Roshni -- Date: Mon, 24 Sep 2012 11:02:51 -0700 Subject: RE: Cassandra Counters From: milindpar...@gmail.com To: user@cassandra.apache.org IMO You would use Cassandra Counters (or other variation of distributed counting) in case of having determined that a centralized version of counting is not going to work. You'd determine the non_feasibility of centralized counting by figuring the speed at which you need to sustain writes and reads and reconcile that with your hard disk seek times (essentially). Once you have proved that you can't do centralized counting, the second layer of arsenal comes into play; which is distributed counting. In distributed counting , the CAP theorem comes into life. in Cassandra, Availability and Network Partitioning trumps over Consistency. So yes, you sacrifice strong consistency for availability and partion tolerance; for eventual consistency. On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com wrote: Hi folks, I looked at my mail below, and Im rambling a bit, so Ill try to re-state my queries pointwise. a) what are the performance tradeoffs on reads writes between creating a standard column family and manually doing the counts by a lookup on a key, versus using counters. b) whats the current state of counters limitations in the latest version of apache cassandra? c) with there being a possibilty of counter values getting out of sync, would counters not be recommended where strong consistency is desired. The normal benefits of cassandra's tunable consistency would not be applicable, as re-tries may cause overstating. So the normal use case is high performance, and where consistency is not paramount. Regards, roshni -- From: roshni_rajago...@hotmail.com To: user@cassandra.apache.org Subject: Cassandra Counters Date: Mon, 24 Sep 2012 16:21:55 +0530 Hi , I'm trying to understand if counters are a good fit for my use case. Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now... and still need help! Suppose I have a list of items- to which I can add or delete a set of items at a time, and I want a count of the items, without considering changing the database or additional components like zookeeper, I have 2 options_ the first is a counter col family, and the second is a standard one 1. List_Counter_CFTotalItemsListId 502.List_Std_CF TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5 ListId 3 70 -20 3 -6 And in the second I can add a new col with every set of items added or deleted. Over time this row may grow wide. To display the final count, Id need to read the row, slice through all columns and add them. In both cases the writes should be fast, in fact standard col family should be faster as there's no read, before write. And for CL ONE write the latency should be same. For reads, the first option is very good, just read one column for a key For the second, the read involves reading the row, and adding each column value via application code. I