Re: cassandra 2.0.6 refuses to start
Strange, could you paste here output of: $ grep -n exec ./bin/cassandra On Mon, Mar 31, 2014 at 8:05 PM, Tim Dunphy bluethu...@gmail.com wrote: Is SELinux enabled? Nope! It's disabled. On Mon, Mar 31, 2014 at 2:50 PM, Michael Shuler mich...@pbandjelly.orgwrote: Is SELinux enabled? -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Finding cut-off points
Hi, I have a large amount (can be 100 million) of (id uuid, score int) entries in Cassandra. I need to, at regular intervals of lets say 30-60 minutes, find the cut-off points for the score needed to be in the top 0.1%, 33% and 66% of all scores. What would a good approach be to this problem? All the data wont fit into memory thus using regular sorting on the application side won't be possible (unless I do it using a merge sort algorithm with files, which feels like a bad solution). Iterating over the data once and build a histogram would cut down the required memory usage quite significantly, but I'm afraid this could still end up being too big. Are there any easier ways to do these computations? Lastly I've thought about the possibility to use analytics tools to compute these things for me - would setting up hadoop and/or pig help me do this in a manner that could make the results accessible to the application servers once done? I've had a hard time finding any guides on how to set it up and what exactly I'd be able to do with it afterwards. Any pointers would be much appreciated. Best regards, Kasper
Dead node appearing in datastax driver
Hello All, We had a 4 node cassandra 2.0.4 cluster ( lets call them host1, host2, host3 and host4), out of which we've removed one node (host4) using nodetool removenode command. Now using nodetool status or nodetool ring we no longer see host4. It's also not appearing in Datastax opscenter. But its intermittently appearing in Metadata.getAllHosts() while connecting using datastax driver 1.0.4. Couple of questions :- -How is it appearing. -Can this have impact on read / write performance of client. Code which we are using to connect is public void connect() { PoolingOptions poolingOptions = new PoolingOptions(); cluster = Cluster.builder() .addContactPoints(inetAddresses.toArray(new String[]{})) .withLoadBalancingPolicy(new RoundRobinPolicy()) .withPoolingOptions(poolingOptions) .withPort(port) .withCredentials(username, password) .build(); Metadata metadata = cluster.getMetadata(); System.out.printf(Connected to cluster: %s\n, metadata.getClusterName()); for (Host host : metadata.getAllHosts()) { System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n, host.getDatacenter(), host.getAddress(), host.getRack()); } } -- Thanks Regards, Apoorva
Re: Dead node appearing in datastax driver
On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Hello All, We had a 4 node cassandra 2.0.4 cluster ( lets call them host1, host2, host3 and host4), out of which we've removed one node (host4) using nodetool removenode command. Now using nodetool status or nodetool ring we no longer see host4. It's also not appearing in Datastax opscenter. But its intermittently appearing in Metadata.getAllHosts() while connecting using datastax driver 1.0.4. Couple of questions :- -How is it appearing. Not sure. Can you try querying the peers system table on each of your nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is still mentioned somewhere? -Can this have impact on read / write performance of client. No. If the host doesn't exists, the driver might try to reconnect to it at times, but since it won't be able to, it won't try to use it for reads and writes. That does mean you might have a reconnection task running with some regularity, but 1) it's not on the write/read path of queries and 2) provided you've left the default reconnection policy, this will happen once every 10 minutes and will be pretty cheap so that it will consume an completely negligible amount of ressources. That doesn't mean I'm not interested tracking down why that happens in the first place though. -- Sylvain Code which we are using to connect is public void connect() { PoolingOptions poolingOptions = new PoolingOptions(); cluster = Cluster.builder() .addContactPoints(inetAddresses.toArray(new String[]{})) .withLoadBalancingPolicy(new RoundRobinPolicy()) .withPoolingOptions(poolingOptions) .withPort(port) .withCredentials(username, password) .build(); Metadata metadata = cluster.getMetadata(); System.out.printf(Connected to cluster: %s\n, metadata.getClusterName()); for (Host host : metadata.getAllHosts()) { System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n, host.getDatacenter(), host.getAddress(), host.getRack()); } } -- Thanks Regards, Apoorva
Re: Dead node appearing in datastax driver
Hello Sylvian, Queried system.peers on three live nodes and host4 is appearing on two of these. On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hello All, We had a 4 node cassandra 2.0.4 cluster ( lets call them host1, host2, host3 and host4), out of which we've removed one node (host4) using nodetool removenode command. Now using nodetool status or nodetool ring we no longer see host4. It's also not appearing in Datastax opscenter. But its intermittently appearing in Metadata.getAllHosts() while connecting using datastax driver 1.0.4. Couple of questions :- -How is it appearing. Not sure. Can you try querying the peers system table on each of your nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is still mentioned somewhere? -Can this have impact on read / write performance of client. No. If the host doesn't exists, the driver might try to reconnect to it at times, but since it won't be able to, it won't try to use it for reads and writes. That does mean you might have a reconnection task running with some regularity, but 1) it's not on the write/read path of queries and 2) provided you've left the default reconnection policy, this will happen once every 10 minutes and will be pretty cheap so that it will consume an completely negligible amount of ressources. That doesn't mean I'm not interested tracking down why that happens in the first place though. -- Sylvain Code which we are using to connect is public void connect() { PoolingOptions poolingOptions = new PoolingOptions(); cluster = Cluster.builder() .addContactPoints(inetAddresses.toArray(new String[]{})) .withLoadBalancingPolicy(new RoundRobinPolicy()) .withPoolingOptions(poolingOptions) .withPort(port) .withCredentials(username, password) .build(); Metadata metadata = cluster.getMetadata(); System.out.printf(Connected to cluster: %s\n, metadata.getClusterName()); for (Host host : metadata.getAllHosts()) { System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n, host.getDatacenter(), host.getAddress(), host.getRack()); } } -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva
Re: Dead node appearing in datastax driver
Did that and I actually see a significant reduction in write latency. On Tue, Apr 1, 2014 at 5:35 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Hello Sylvian, Queried system.peers on three live nodes and host4 is appearing on two of these. That's why the driver thinks they are still there. You're most probably running into https://issues.apache.org/jira/browse/CASSANDRA-6053 since you are on C* 2.0.4. As said, this is relatively harmless, but you should think about upgrading to 2.0.6 to fix it in the future (you could manually remove the bad entries in System.peers in the meantime if you want, they are really just leftover that shouldn't be here). -- Sylvain On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hello All, We had a 4 node cassandra 2.0.4 cluster ( lets call them host1, host2, host3 and host4), out of which we've removed one node (host4) using nodetool removenode command. Now using nodetool status or nodetool ring we no longer see host4. It's also not appearing in Datastax opscenter. But its intermittently appearing in Metadata.getAllHosts() while connecting using datastax driver 1.0.4. Couple of questions :- -How is it appearing. Not sure. Can you try querying the peers system table on each of your nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is still mentioned somewhere? -Can this have impact on read / write performance of client. No. If the host doesn't exists, the driver might try to reconnect to it at times, but since it won't be able to, it won't try to use it for reads and writes. That does mean you might have a reconnection task running with some regularity, but 1) it's not on the write/read path of queries and 2) provided you've left the default reconnection policy, this will happen once every 10 minutes and will be pretty cheap so that it will consume an completely negligible amount of ressources. That doesn't mean I'm not interested tracking down why that happens in the first place though. -- Sylvain Code which we are using to connect is public void connect() { PoolingOptions poolingOptions = new PoolingOptions(); cluster = Cluster.builder() .addContactPoints(inetAddresses.toArray(newString[]{})) .withLoadBalancingPolicy(new RoundRobinPolicy()) .withPoolingOptions(poolingOptions) .withPort(port) .withCredentials(username, password) .build(); Metadata metadata = cluster.getMetadata(); System.out.printf(Connected to cluster: %s\n, metadata.getClusterName()); for (Host host : metadata.getAllHosts()) { System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n, host.getDatacenter(), host.getAddress(), host.getRack()); } } -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva
Specifying startBefore with iterators with compositeKeys
I have a composite key consisting of: (integer, bytes) and I have rows like: (1,abc), (1,def), (2,abc), (2,def) and I want to find all rows with the integer part = 2. I need to create a startBeyondName using CompositeType.Builder class and am wondering if specifying (2, Bytes.Empty) will sort correctly? I think another way of saying this is: does HeapByteBuffer with pos=,lim=0,cap=0 sort prior to any other possible HeapByteBuffer? Thanks.
Re: Dead node appearing in datastax driver
What does Did that mean? Does that means I upgraded to 2.0.6, or does that mean I manually removed entries from System.peers. If the latter, I'd need more info on what you did exactly, what your peers table looked like before and how they look like now: there is no reason deleting the peers entries for hosts that at not part of the cluster anymore would have anything to do with write latency (but if say you've removed wrong entries, that might have make the driver think some live host had been removed and if the drivers has less nodes to use to dispatch queries, that might impact latency I suppose -- at least that's the only related thing I can think of). -- Sylvain On Tue, Apr 1, 2014 at 2:44 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Did that and I actually see a significant reduction in write latency. On Tue, Apr 1, 2014 at 5:35 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hello Sylvian, Queried system.peers on three live nodes and host4 is appearing on two of these. That's why the driver thinks they are still there. You're most probably running into https://issues.apache.org/jira/browse/CASSANDRA-6053 since you are on C* 2.0.4. As said, this is relatively harmless, but you should think about upgrading to 2.0.6 to fix it in the future (you could manually remove the bad entries in System.peers in the meantime if you want, they are really just leftover that shouldn't be here). -- Sylvain On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hello All, We had a 4 node cassandra 2.0.4 cluster ( lets call them host1, host2, host3 and host4), out of which we've removed one node (host4) using nodetool removenode command. Now using nodetool status or nodetool ring we no longer see host4. It's also not appearing in Datastax opscenter. But its intermittently appearing in Metadata.getAllHosts() while connecting using datastax driver 1.0.4. Couple of questions :- -How is it appearing. Not sure. Can you try querying the peers system table on each of your nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is still mentioned somewhere? -Can this have impact on read / write performance of client. No. If the host doesn't exists, the driver might try to reconnect to it at times, but since it won't be able to, it won't try to use it for reads and writes. That does mean you might have a reconnection task running with some regularity, but 1) it's not on the write/read path of queries and 2) provided you've left the default reconnection policy, this will happen once every 10 minutes and will be pretty cheap so that it will consume an completely negligible amount of ressources. That doesn't mean I'm not interested tracking down why that happens in the first place though. -- Sylvain Code which we are using to connect is public void connect() { PoolingOptions poolingOptions = new PoolingOptions(); cluster = Cluster.builder() .addContactPoints(inetAddresses.toArray(newString[]{})) .withLoadBalancingPolicy(new RoundRobinPolicy()) .withPoolingOptions(poolingOptions) .withPort(port) .withCredentials(username, password) .build(); Metadata metadata = cluster.getMetadata(); System.out.printf(Connected to cluster: %s\n, metadata.getClusterName()); for (Host host : metadata.getAllHosts()) { System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n, host.getDatacenter(), host.getAddress(), host.getRack()); } } -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva
Re: Finding cut-off points
Hi Kasper, I'd suggest taking a look at Spark, Storm, or Samza (all are Apache projects) for a possible approach. Depending on your needs and your existing infrastructure, one of those may work better than others for you. Steve On Tue, Apr 1, 2014 at 2:51 AM, Kasper Petersen kas...@sybogames.comwrote: Hi, I have a large amount (can be 100 million) of (id uuid, score int) entries in Cassandra. I need to, at regular intervals of lets say 30-60 minutes, find the cut-off points for the score needed to be in the top 0.1%, 33% and 66% of all scores. What would a good approach be to this problem? All the data wont fit into memory thus using regular sorting on the application side won't be possible (unless I do it using a merge sort algorithm with files, which feels like a bad solution). Iterating over the data once and build a histogram would cut down the required memory usage quite significantly, but I'm afraid this could still end up being too big. Are there any easier ways to do these computations? Lastly I've thought about the possibility to use analytics tools to compute these things for me - would setting up hadoop and/or pig help me do this in a manner that could make the results accessible to the application servers once done? I've had a hard time finding any guides on how to set it up and what exactly I'd be able to do with it afterwards. Any pointers would be much appreciated. Best regards, Kasper -- Steve Robenalt Software Architect HighWire | Stanford University 425 Broadway St, Redwood City, CA 94063 srobe...@stanford.edu http://highwire.stanford.edu
Re: Timeuuid inserted with now(), how to get the value back in Java client?
no, there's no way. you should generate the TIMEUUID on the client side so that you have it. T# On Sat, Mar 29, 2014 at 1:01 AM, Andy Atj2 andya...@gmail.com wrote: I'm writing a Java client to a Cassandra db. One of the main primary keys is a timeuuid. I plan to do INSERTs using now() and have Cassandra generate the value of the timeuuid. After the INSERT, I need the Cassandra-generated timeuuid value. Is there an easy wsay to get it, without having to re-query for the record I just inserted, hoping to get only one record back? Remember, I don't have the PK. Eg, in every other db there's a way to get the generated PK back. In sql it's @@identity, in oracle its...etc etc. I know Cassandra is not an RDBMS. All I want is the value Cassandra just generated. Thanks, Andy
Re: Timeuuid inserted with now(), how to get the value back in Java client?
You would get UUID object from cassandra API. Then you may use uuid.timestamp() to get time stamp for the same -Vivek On Tue, Apr 1, 2014 at 9:55 PM, Theo Hultberg t...@iconara.net wrote: no, there's no way. you should generate the TIMEUUID on the client side so that you have it. T# On Sat, Mar 29, 2014 at 1:01 AM, Andy Atj2 andya...@gmail.com wrote: I'm writing a Java client to a Cassandra db. One of the main primary keys is a timeuuid. I plan to do INSERTs using now() and have Cassandra generate the value of the timeuuid. After the INSERT, I need the Cassandra-generated timeuuid value. Is there an easy wsay to get it, without having to re-query for the record I just inserted, hoping to get only one record back? Remember, I don't have the PK. Eg, in every other db there's a way to get the generated PK back. In sql it's @@identity, in oracle its...etc etc. I know Cassandra is not an RDBMS. All I want is the value Cassandra just generated. Thanks, Andy
Change IP address for all nodes
Hi, Due to some network renumbering, we will need to change the IP addresses (and networks) of all nodes in our cluster. Will the following procedure work, or is there anything special we'll need to consider? 1. shut down all nodes 2. update listen_address, rpc_address, seeds in cassandra.yaml 3. update cassandra-topology.properties 4. start all nodes I'm asking because I remember some older reports (from 2011) about possible issues. We are currently on 1.2.x. Thanks, Christof
Re: Change IP address for all nodes
On Tue, Apr 1, 2014 at 12:53 PM, Christof Roduner chris...@scandit.comwrote: Will the following procedure work, or is there anything special we'll need to consider? If you do not use auto_bootstrap:false, this will not work. https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/ Be careful, within some versions of 1.2.x there is an issue with using auto_bootstrap which can result in broken node state in gossip. Do not do this procedure in one of those versions! If you have trouble finding it in JIRA/CHANGES.txt, let me know and I can try to dig it up. =Rob
Re: Read performance in map data type
On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. Use unique keyspace (or table) names for each test? That's the approach they're taking in 5202... =Rob
Re: Dead node appearing in datastax driver
I manually removed entries from System.peers. The improvements can well be coincidental as various other apps were also running on the same test bed. On Tue, Apr 1, 2014 at 8:43 PM, Sylvain Lebresne sylv...@datastax.comwrote: What does Did that mean? Does that means I upgraded to 2.0.6, or does that mean I manually removed entries from System.peers. If the latter, I'd need more info on what you did exactly, what your peers table looked like before and how they look like now: there is no reason deleting the peers entries for hosts that at not part of the cluster anymore would have anything to do with write latency (but if say you've removed wrong entries, that might have make the driver think some live host had been removed and if the drivers has less nodes to use to dispatch queries, that might impact latency I suppose -- at least that's the only related thing I can think of). -- Sylvain On Tue, Apr 1, 2014 at 2:44 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Did that and I actually see a significant reduction in write latency. On Tue, Apr 1, 2014 at 5:35 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hello Sylvian, Queried system.peers on three live nodes and host4 is appearing on two of these. That's why the driver thinks they are still there. You're most probably running into https://issues.apache.org/jira/browse/CASSANDRA-6053 since you are on C* 2.0.4. As said, this is relatively harmless, but you should think about upgrading to 2.0.6 to fix it in the future (you could manually remove the bad entries in System.peers in the meantime if you want, they are really just leftover that shouldn't be here). -- Sylvain On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hello All, We had a 4 node cassandra 2.0.4 cluster ( lets call them host1, host2, host3 and host4), out of which we've removed one node (host4) using nodetool removenode command. Now using nodetool status or nodetool ring we no longer see host4. It's also not appearing in Datastax opscenter. But its intermittently appearing in Metadata.getAllHosts() while connecting using datastax driver 1.0.4. Couple of questions :- -How is it appearing. Not sure. Can you try querying the peers system table on each of your nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is still mentioned somewhere? -Can this have impact on read / write performance of client. No. If the host doesn't exists, the driver might try to reconnect to it at times, but since it won't be able to, it won't try to use it for reads and writes. That does mean you might have a reconnection task running with some regularity, but 1) it's not on the write/read path of queries and 2) provided you've left the default reconnection policy, this will happen once every 10 minutes and will be pretty cheap so that it will consume an completely negligible amount of ressources. That doesn't mean I'm not interested tracking down why that happens in the first place though. -- Sylvain Code which we are using to connect is public void connect() { PoolingOptions poolingOptions = new PoolingOptions(); cluster = Cluster.builder() .addContactPoints(inetAddresses.toArray(newString[]{})) .withLoadBalancingPolicy(new RoundRobinPolicy()) .withPoolingOptions(poolingOptions) .withPort(port) .withCredentials(username, password) .build(); Metadata metadata = cluster.getMetadata(); System.out.printf(Connected to cluster: %s\n, metadata.getClusterName()); for (Host host : metadata.getAllHosts()) { System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n, host.getDatacenter(), host.getAddress(), host.getRack()); } } -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva
Re: Opscenter help?
- Were you running a mixed EL5/EL6 environment? - What exact version of Cassandra were you upgrading from? As for SE, you have to amass a bit of their magic points before you can start to take more actions (tagging, down-voting, etc). Surprisingly good at keeping down spam. -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
Drop in node replacements.
Is it possible to have true drop in node replacements? For example, I have a cluster of 51 Cassandra nodes, 17 in each data center. I had one host go down on DC3, and when it came back up, it joined the ring, etc., but was not receiving any data. Even after multiple restarts and forcing a repair on the entire fleet, it still holds maybe ~30MB on a cluster that is absorbing ~1.2TB a day. On top of that, I decided to see if I could recreate it--by taking down a node, reprovisioning it, and then throwing it back in WITHOUT having it take over the old node's tokens, it never seems to ever absorb any of the old data after a full repair, and it never seems to start loading new data (I now have 3 nodes using ~30MB). Am I doing something wrong? I would imagine a repair on the entire cluster (across 3 DCs) would force C* to put some copies onto the other node--but this doesn't seem to be the case. What can I do? Andrew
Re: Specifying startBefore with iterators with compositeKeys
On Tue, Apr 1, 2014 at 9:53 AM, Brian Tarbox tar...@cabotresearch.comwrote: I think another way of saying this is: does HeapByteBuffer with pos=,lim=0,cap=0 sort prior to any other possible HeapByteBuffer? Yes. However, if you use it as a slice finish, an empty ByteBuffer is greater than any other value. -- Tyler Hobbs DataStax http://datastax.com/
Re: Row cache for writes
On Mon, Mar 31, 2014 at 11:37 AM, Wayne Schroeder wschroe...@pinsightmedia.com wrote: I found a lot of documentation about the read path for key and row caches, but I haven't found anything in regard to the write path. My app has the need to record a large quantity of very short lived temporal data that will expire within seconds and only have a small percentage of the rows accessed before they expire. Ideally, and I have done the math, I would like the data to never hit disk and just stay in memory once written until it expires. How might I accomplish this? It's not perfect, but set a short TTL on the data and set gc_grace_seconds to 0 for the table. Tombstones will still be written to disk, but almost everything will be discarded in its first compaction. You could also lower the min compaction threshold for size-tiered compaction to 2 to force compactions to happen more quickly. I am not concerned about data consistency at all on this so if I could even avoid the commit log, that would be even better. You can set durable_writes = false for the keyspace. My main concern is that I don't see any evidence that writes end up in the cache--that it takes at least one read to get it into the cache. I also realize that, assuming I don't cause SSTable writes due to sheer quantity, that the data would be in memory anyway. Has anyone done anything similar to this that could provide direction? Writes invalidate row cache entries, so that's not what you want. -- Tyler Hobbs DataStax http://datastax.com/
Re: Drop in node replacements.
On Tue, Apr 1, 2014 at 3:24 PM, Redmumba redmu...@gmail.com wrote: Is it possible to have true drop in node replacements? For example, I have a cluster of 51 Cassandra nodes, 17 in each data center. I had one host go down on DC3, and when it came back up, it joined the ring, etc., but was not receiving any data. Even after multiple restarts and forcing a repair on the entire fleet, it still holds maybe ~30MB on a cluster that is absorbing ~1.2TB a day. What version of Cassandra? Real hardware/network or virtual? =Rob
Re: Read performance in map data type
Thanks Sourabh, I've modelled my table as studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID) as primarily I'll be querying using studentID and sometime using studentID and subjectID. I've tried driver 2.0.0 and its giving good results. Also using its auto paging feature. Any idea what should be a typical value for fetch size. And does the fetch size depends on how many columns are there in the CQL table for e.g. should fetch size in a table like studentID int, subjectID int, marks1 int, marks2 int, marks3 int marksN int PRIMARY KEY(studentID, subjectID) be less than fetch size in studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID) On Wed, Apr 2, 2014 at 2:20 AM, Robert Coli rc...@eventbrite.com wrote: On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. Use unique keyspace (or table) names for each test? That's the approach they're taking in 5202... =Rob -- Thanks Regards, Apoorva
Re: Read performance in map data type
From the doc : The fetch size controls how much resulting rows will be retrieved simultaneously. So, I guess it does not depend on the number of columns as such. As all the columns for a key reside on the same node, I think it wouldn't matter much whatever be the number of columns as long as we have enough memory in the app. Default value is 5000. (com.datastax.driver.core.QueryOptions) We use it with the default value. I have never profiled cassandra for read load. If you profile it for different fetch sizes, please share the results :) On Wed, Apr 2, 2014 at 8:45 AM, Apoorva Gaurav apoorva.gau...@myntra.comwrote: Thanks Sourabh, I've modelled my table as studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID) as primarily I'll be querying using studentID and sometime using studentID and subjectID. I've tried driver 2.0.0 and its giving good results. Also using its auto paging feature. Any idea what should be a typical value for fetch size. And does the fetch size depends on how many columns are there in the CQL table for e.g. should fetch size in a table like studentID int, subjectID int, marks1 int, marks2 int, marks3 int marksN int PRIMARY KEY(studentID, subjectID) be less than fetch size in studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID) On Wed, Apr 2, 2014 at 2:20 AM, Robert Coli rc...@eventbrite.com wrote: On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. Use unique keyspace (or table) names for each test? That's the approach they're taking in 5202... =Rob -- Thanks Regards, Apoorva -- Sourabh Agrawal Bangalore +91 9945657973