[BETA RELEASE] Apache Cassandra 1.2.0-beta2 released
The Cassandra team is pleased to announce the release of the second beta for the future Apache Cassandra 1.2.0. Let me first stress that this is beta software and as such is *not* ready for production use. This release is still beta so is likely not bug free. However, lots have been fixed since beta1 and if everything goes right, we are hopeful that a first release candidate may follow shortly. Please do help testing this beta to help make that happen. If you encounter any problem during your testing, please report[3,4] them. And be sure to a look at the change log[1] and the release notes[2] to see where Cassandra 1.2 differs from the previous series. Apache Cassandra 1.2.0-beta2[5] is available as usual from the cassandra website (http://cassandra.apache.org/download/) and a debian package is available using the 12x branch (see http://wiki.apache.org/cassandra/DebianPackaging). Thank you for your help in testing and have fun with it. [1]: http://goo.gl/wnDAV (CHANGES.txt) [2]: http://goo.gl/CBsqs (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA [4]: user@cassandra.apache.org [5]: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-beta2
How to search User (Entity) columns without sec. index?
Here is the thing. I'm modelling User entity and got to problem with searching trough user columns. CREATE TABLE users ( user_uuid uuid PRIMARY KEY, date_created timestamp, password varchar, username varchar, name varchar, first_name varchar, last_name varchar, email varchar, ... ) ; CREATE INDEX users__username_idx ON users (username); Now I know it's a bad practice to model sec. index on 'username' becouse of uniqness and all but what's the alternative? I'd want 'username' to be searchable? Tnx, *Alan Ristić* *m*: 040 423 688
Re: How to search User (Entity) columns without sec. index?
I think there is just a few solutions. - Secondary index on username - CF used as an index (store username as row and all the uuid of users with this username as columns) - Get all the data and filter after (really poor performances depending on the size of the data set) I can't see an other way to perform your query. 2012/11/9 Alan Ristić alan.ris...@gmail.com Here is the thing. I'm modelling User entity and got to problem with searching trough user columns. CREATE TABLE users ( user_uuid uuid PRIMARY KEY, date_created timestamp, password varchar, username varchar, name varchar, first_name varchar, last_name varchar, email varchar, ... ) ; CREATE INDEX users__username_idx ON users (username); Now I know it's a bad practice to model sec. index on 'username' becouse of uniqness and all but what's the alternative? I'd want 'username' to be searchable? Tnx, *Alan Ristić* *m*: 040 423 688
Remove crashed node
Hi there, We have had a crashed node that is currently removed from the rack. However when I try a schema upgrade / truncate operation it complains of the unreachable node. I tried the removetoken, but that didn't resolve. Any ideas on how to fix this? Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl http://goo.gl/Lt7BC Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Remove crashed node
On cassandra-cli if you describe cluster; I guess you will see an UNREACHABLE node. If you do so, there is a way to remove this unreachable node. Go to the JMX management console (ip_of_one_up_node:8081 by default) Then go to the org.apache.cassandra.net:type=Gossiper link and use the unsafeAssassinateEndpoint input. Fill it with the ip of the down node and invoke the function. nodetool gossipinfo should now tell you that this node has left the ring and let you truncate or whatever you need to do. Use this carefully this function is composed of the unsafe and assassinate words because it forces the node to go out of the ring without any check or replication. Alain 2012/11/9 Robin Verlangen ro...@us2.nl Hi there, We have had a crashed node that is currently removed from the rack. However when I try a schema upgrade / truncate operation it complains of the unreachable node. I tried the removetoken, but that didn't resolve. Any ideas on how to fix this? Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl http://goo.gl/Lt7BC Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Remove crashed node
Hi Alain, How can I access that? Web browser does not seem to work. Do I need any software to login? If so, what is proper software for Windows? Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl http://goo.gl/Lt7BC Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. On Fri, Nov 9, 2012 at 2:50 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: On cassandra-cli if you describe cluster; I guess you will see an UNREACHABLE node. If you do so, there is a way to remove this unreachable node. Go to the JMX management console (ip_of_one_up_node:8081 by default) Then go to the org.apache.cassandra.net:type=Gossiper link and use the unsafeAssassinateEndpoint input. Fill it with the ip of the down node and invoke the function. nodetool gossipinfo should now tell you that this node has left the ring and let you truncate or whatever you need to do. Use this carefully this function is composed of the unsafe and assassinate words because it forces the node to go out of the ring without any check or replication. Alain 2012/11/9 Robin Verlangen ro...@us2.nl Hi there, We have had a crashed node that is currently removed from the rack. However when I try a schema upgrade / truncate operation it complains of the unreachable node. I tried the removetoken, but that didn't resolve. Any ideas on how to fix this? Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl http://goo.gl/Lt7BC Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: leveled compaction and tombstoned data
On 2012-11-08, at 1:12 PM, B. Todd Burruss bto...@gmail.com wrote: we are having the problem where we have huge SSTABLEs with tombstoned data in them that is not being compacted soon enough (because size tiered compaction requires, by default, 4 like sized SSTABLEs). this is using more disk space than we anticipated. we are very write heavy compared to reads, and we delete the data after N number of days (depends on the column family, but N is around 7 days) my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage From my experience, levelled compaction makes space reclamation after deletes even less predictable than sized-tier. The reason is that deletes, like all mutations, are just recorded into sstables. They enter level0, and get slowly, over time, promoted upwards to levelN. Depending on your *total* mutation volume VS your data set size, this may be quite a slow process. This is made even worse if the size of the data you're deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted by a small row-level tombstone. If the row is sitting in level 4, the tombstone won't impact it until enough data has pushed over all existing data in level3, level2, level1, level0 Finally, to guard against the tombstone missing any data, the tombstone itself is not candidate for removal (I believe even after gc_grace has passed) unless it's reached the highest populated level in levelled compaction. This means if you have 4 levels and issue a ton of deletes (even deletes that will never impact existing data), these tombstones are deadweight that cannot be purged until they hit level4. For a write-heavy workload, I recommend you stick with sized-tier. You have several options at your disposal (compaction min/max thresholds, gc_grace) to move things along. If that doesn't help, I've heard of some fairly reputable people doing some fairly blasphemous things (major compactions every night).
Re: Remove crashed node
You have to install mx4j-tools.jar. http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J It's a java tool, so it is usable on both windows and linux. Here is the link to dl mx4j-tool.jar : http://www.java2s.com/Code/JarDownload/mx4j/mx4j-tools-3.0.2.jar.zip unzip it and add it to the path of your cassandra libraries, restart the node where it is installed and you should be ok. Alain 2012/11/9 Robin Verlangen ro...@us2.nl Hi Alain, How can I access that? Web browser does not seem to work. Do I need any software to login? If so, what is proper software for Windows? Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl http://goo.gl/Lt7BC Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. On Fri, Nov 9, 2012 at 2:50 PM, Alain RODRIGUEZ arodr...@gmail.comwrote: On cassandra-cli if you describe cluster; I guess you will see an UNREACHABLE node. If you do so, there is a way to remove this unreachable node. Go to the JMX management console (ip_of_one_up_node:8081 by default) Then go to the org.apache.cassandra.net:type=Gossiper link and use the unsafeAssassinateEndpoint input. Fill it with the ip of the down node and invoke the function. nodetool gossipinfo should now tell you that this node has left the ring and let you truncate or whatever you need to do. Use this carefully this function is composed of the unsafe and assassinate words because it forces the node to go out of the ring without any check or replication. Alain 2012/11/9 Robin Verlangen ro...@us2.nl Hi there, We have had a crashed node that is currently removed from the rack. However when I try a schema upgrade / truncate operation it complains of the unreachable node. I tried the removetoken, but that didn't resolve. Any ideas on how to fix this? Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl http://goo.gl/Lt7BC Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Indexing Data in Cassandra with Elastic Search
Thanks for sharing this. We are also using Cassandra + Storm + Queue messaging (Kestrel for now) and are always glad to learn. Alain 2012/11/9 Brian O'Neill b...@alumni.brown.edu For those looking to index data in Cassandra with Elastic Search, here is what we decided to do: http://brianoneill.blogspot.com/2012/11/big-data-quadfecta-cassandra-storm.html -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: leveled compaction and tombstoned data
The rules for tombstone eviction are as follows (regardless of your compaction strategy): 1. gc_grace must be expired, and 2. No other row fragments can exist for the row that aren't also participating in the compaction. For LCS, there is no 'rule' that the tombstones can only be evicted at the highest level. They can be evicted on whichever of the level that the row converges on. Depending on your use case this may mean it always happens at level4, it might also mean that it most often happens at L1, or L2. On Fri, Nov 9, 2012 at 7:31 AM, Mina Naguib mina.nag...@adgear.com wrote: On 2012-11-08, at 1:12 PM, B. Todd Burruss bto...@gmail.com wrote: we are having the problem where we have huge SSTABLEs with tombstoned data in them that is not being compacted soon enough (because size tiered compaction requires, by default, 4 like sized SSTABLEs). this is using more disk space than we anticipated. we are very write heavy compared to reads, and we delete the data after N number of days (depends on the column family, but N is around 7 days) my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage From my experience, levelled compaction makes space reclamation after deletes even less predictable than sized-tier. The reason is that deletes, like all mutations, are just recorded into sstables. They enter level0, and get slowly, over time, promoted upwards to levelN. Depending on your *total* mutation volume VS your data set size, this may be quite a slow process. This is made even worse if the size of the data you're deleting (say, an entire row worth several hundred kilobytes) is to-be-deleted by a small row-level tombstone. If the row is sitting in level 4, the tombstone won't impact it until enough data has pushed over all existing data in level3, level2, level1, level0 Finally, to guard against the tombstone missing any data, the tombstone itself is not candidate for removal (I believe even after gc_grace has passed) unless it's reached the highest populated level in levelled compaction. This means if you have 4 levels and issue a ton of deletes (even deletes that will never impact existing data), these tombstones are deadweight that cannot be purged until they hit level4. For a write-heavy workload, I recommend you stick with sized-tier. You have several options at your disposal (compaction min/max thresholds, gc_grace) to move things along. If that doesn't help, I've heard of some fairly reputable people doing some fairly blasphemous things (major compactions every night). -- Ben Coverston DataStax -- The Apache Cassandra Company
HugeTLB (Hugepage) Support on a Cassandra Cluster
Hi, Does anyone know if DataStax/Cassandra recommends using HugeTLB on a cluster? Thank you James Morantus Sr. Database Administrator 203-299-8733 Priceline.com
Re: leveled compaction and tombstoned data
On Thu, Nov 8, 2012 at 10:12 AM, B. Todd Burruss bto...@gmail.com wrote: my question is would leveled compaction help to get rid of the tombstoned data faster than size tiered, and therefore reduce the disk space usage? You could also... 1) run a major compaction 2) code up sstablesplit 3) profit! This method incurs a management penalty if not automated, but is otherwise the most efficient way to deal with tombstones and obsolete data.. :D =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Strange delay in query
That must be it. I dumped the sstables to json and there are lots of records, including ones that are returned to my application, that have the deletedAt attribute. I think this is because the regular repair job was not running for some time, surely more than the grace period, and lots of tombstones stayed behind even though we are running repair regularly now. Thanks! André On Nov 8, 2012, at 10:51 PM, Josep Blanquer blanq...@rightscale.com wrote: Can it be that you have tons and tons of tombstoned columns in the middle of these two? I've seen plenty of performance issues with wide rows littered with column tombstones (you could check with dumping the sstables...) Just a thought... Josep M. On Thu, Nov 8, 2012 at 12:23 PM, André Cruz andre.c...@co.sapo.pt wrote: These are the two columns in question: = (super_column=13957152-234b-11e2-92bc-e0db550199f4, (column=attributes, value=, timestamp=1351681613263657) (column=blocks, value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q, timestamp=1351681613263657) (column=hash, value=8_p2QaeRaX_QwJbUWQ07ZqlNHei7ixu0MHxgu9oennfYOGfyH6EsEe_LYO8V8EC_1NPL44Gx8B7UhYV9VSb7Lg, timestamp=1351681613263657) (column=icon, value=image_jpg, timestamp=1351681613263657) (column=is_deleted, value=true, timestamp=1351681613263657) (column=is_dir, value=false, timestamp=1351681613263657) (column=mime_type, value=image/jpeg, timestamp=1351681613263657) (column=mtime, value=1351646803, timestamp=1351681613263657) (column=name, value=/Mobile Photos/Photo 2012-10-28 17_13_50.jpeg, timestamp=1351681613263657) (column=revision, value=13957152-234b-11e2-92bc-e0db550199f4, timestamp=1351681613263657) (column=size, value=1379001, timestamp=1351681613263657) (column=thumb_exists, value=true, timestamp=1351681613263657)) = (super_column=40b7ae4e-2449-11e2-8610-e0db550199f4, (column=attributes, value={posix: 420}, timestamp=1351790781154800) (column=blocks, value=9UCDkHNb8-8LuKr2bv9PjKcWCT0v7FCZa0ebNSflES4-o7QD6eYschVaweCKSbR29Dq2IeGl_Cu7BVnYJYphTQ, timestamp=1351790781154800) (column=hash, value=kao2EV8jw_wN4EBoMkCXZWCwg3qQ0X6m9_X9JIGkEkiGKJE_JeKgkdoTAkAefXgGtyhChuhWPlWMxl_tX7VZUw, timestamp=1351790781154800) (column=icon, value=text_txt, timestamp=1351790781154800) (column=is_dir, value=false, timestamp=1351790781154800) (column=mime_type, value=text/plain, timestamp=1351790781154800) (column=mtime, value=1351378576, timestamp=1351790781154800) (column=name, value=/Documents/VIMDocument.txt, timestamp=1351790781154800) (column=revision, value=40b7ae4e-2449-11e2-8610-e0db550199f4, timestamp=1351790781154800) (column=size, value=13, timestamp=1351790781154800) (column=thumb_exists, value=false, timestamp=1351790781154800)) I don't think their size is an issue here. André On Nov 8, 2012, at 6:04 PM, Andrey Ilinykh ailin...@gmail.com wrote: What is the size of columns? Probably those two are huge. On Thu, Nov 8, 2012 at 4:01 AM, André Cruz andre.c...@co.sapo.pt wrote: On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote: This error also happens on my application that uses pycassa, so I don't think this is the same bug. I have narrowed it down to a slice between two consecutive columns. Observe this behaviour using pycassa: DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys() DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 Connection 52905488 (xxx:9160) was checked out from pool 51715344 DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 Connection 52905488 (xxx:9160) was checked in to pool 51715344 [UUID('13957152-234b-11e2-92bc-e0db550199f4'), UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')] A two column slice took more than 2s to return. If I request the next 2 column slice: DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'), column_count=2, column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys() DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 Connection 52904912 (xxx:9160) was checked out from pool 51715344 DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 Connection 52904912 (xxx:9160) was checked in to pool 51715344 [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), UUID('a364b028-2449-11e2-8882-e0db550199f4')] This takes 20msec... Is there a rational explanation for this different behaviour? Is there some threshold that I'm running into? Is there any way to obtain more debugging information about this problem? Thanks, André
Frame size exceptions occurring with ColumnFamilyInputFormat for very large rows
Hello, I am trying to run a Hadoop job that pulls data out of Cassandra via ColumnFamilyInputFormat. I am getting a frame size exception. To remedy that, I have set both the thrift_framed_transport_size_in_mb and thrift_max_message_length_in_mb to an infinite amount at 10mb on all nodes. Moreover, I have restarted the cluster and the cassandra.yaml files have been reloaded. However, I am still getting: 12/11/09 21:39:52 INFO mapred.JobClient: map 62% reduce 0% 12/11/09 21:40:09 INFO mapred.JobClient: Task Id : attempt_201211082011_0015_m_000479_2, Status : FAILED java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (30046945) larger than max length (16384000)! at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:400) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:406) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:324) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:189) Question: Why is 16384000 bytes (I assume) != 10mb? Next, I made this parameter true as a last hail mary attempt: cassandra.input.widerows=true ...still with no luck. Does someone know what I might be missing? Thank you very much for your time, Marko. http://markorodriguez.com
Re: backup/restore from sstable files ?
On Thu, Nov 8, 2012 at 5:15 PM, Yang tedd...@gmail.com wrote: some of my colleagues seem to use this method to backup/restore a cluster, successfully: on each of the node, save entire /cassandra/data/ dir to S3, then on a new set of nodes, with exactly the same number of nodes, copy back each of the data/ dir. then boot up cluster. Yep, that works as long as the two clusters have the same tokens and replication strategies. but I wonder how it worked: doesn't the system keyspace store information specific to the current cluster, such as my sibling nodes in the cluster, my IP ?? all these would change once you copy the frozen data files onto a new set of nodes. Yes, for this reason you should not restore the system keyspace files (except, optionally, Schema.). Definitely you should not restore LocationInfo. LocationInfo contains ip-to-token mappings. Also you should make your target cluster have a unique cluster name, and the old cluster name is also stored in LocationInfo... =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: unsubscribe
On Thu, Nov 8, 2012 at 4:57 PM, Jeremy McKay jeremy.mc...@ntrepidcorp.com wrote: http://wiki.apache.org/cassandra/FAQ#unsubscribe -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: read request distribution
I think the row whose row key falls into the token range of the high latency node is likely to have more columns than the other nodes. I have three nodes with RF = 3, so all the nodes have all the data. And CL = Quorum, meaning each request is sent to all three nodes and response is sent back to client when two of them respond. What exactly does Read Count from nodetool cfstats mean then, should it be the same across all the nodes? I checked with Hector, it uses Round Robin LB strategy. And I also tested writes, and the writes are distributed across the cluster evenly. Below is the output from nodetool. Any one has a clue what might happened? Node1: Read Count: 318679 Read Latency: 72.47641436367003 ms. Write Count: 158680 Write Latency: 0.07918750315099571 ms. Node 2: Read Count: 251079 Read Latency: 86.91948475579399 ms. Write Count: 158450 Write Latency: 0.1744694540864626 ms. Node 3: Read Count: 149876 Read Latency: 168.14125553123915 ms. Write Count: 157896 Write Latency: 0.06468631250949992 ms. nodetool ring Address DC Rack Status State Load Effective-Ownership Token 113427455640312821154458202477256070485 10.1.3.152 datacenter1 rack1 Up Normal 35.85 GB 100.00% 0 10.1.3.153 datacenter1 rack1 Up Normal 35.86 GB 100.00% 56713727820156410577229101238628035242 10.1.3.155 datacenter1 rack1 Up Normal 35.85 GB 100.00% 113427455640312821154458202477256070485 Keyspace: benchmark: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:3] I am really confused by the Read Count number from nodetool cfstats Really appreciate any hints. -Wei From: Wei Zhu wz1...@yahoo.com To: Cassandr usergroup user@cassandra.apache.org Sent: Thursday, November 8, 2012 9:37 PM Subject: read request distribution Hi All, I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I generated 6M rows with sequence number from 1 to 6m, so the rows should be evenly distributed among the three nodes disregarding the replicates. I am doing a benchmark with read only requests, I generate read request for randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one node has only half the requests as the other one and the third node sits in the middle. So the ratio is like 2:3:4. The node with the most read requests actually has the smallest latency and the one with the least read requests reports the largest latency. The difference is pretty big, the fastest is almost double the slowest. All three nodes have the exactly the same hardware and the data size on each node are the same since the RF is three and all of them have the complete data. I am using Hector as client and the random read request are in millions. I can't think of a reasonable explanation. Can someone please shed some lights? Thanks. -Wei
Retrieve Multiple CFs from Range Slice
Hi! Is there a way to retrieve the columns for all column families on a given row while fetching range slices? My keyspace has two column families and when I'm scanning over the rows, I'd like to be able to fetch the columns in both CFs while iterating over the keys so as to avoid having to run two scan operations. When I set the CF to an empty string, ala ColumnParent.setColumn_family(), it throws an error non-empty columnfamily is required. (Using the Thrift API directly from JAVA on Cass 1.1.6) My HBase scans can return both CFs per row so it works nicely. Thanks!
Re: Retrieve Multiple CFs from Range Slice
HBase is different is this regard. A table is comprised of multiple column families, and they can be scanned at once. However, last time I checked, scanning a table with two column families is still two seeks across three different column families. A similar thing can be accomplished in cassandra by issuing two range scans, (possibly executing them asynchronously in two threads) I am sure someone will correct me if I am mistaken. On Fri, Nov 9, 2012 at 11:46 PM, Chris Larsen clar...@euphoriaaudio.com wrote: Hi! Is there a way to retrieve the columns for all column families on a given row while fetching range slices? My keyspace has two column families and when I’m scanning over the rows, I’d like to be able to fetch the columns in both CFs while iterating over the keys so as to avoid having to run two scan operations. When I set the CF to an empty string, ala ColumnParent.setColumn_family(), it throws an error “non-empty columnfamily is required”. (Using the Thrift API directly from JAVA on Cass 1.1.6) My HBase scans can return both CFs per row so it works nicely. Thanks!