Re: cassandra 2.0.6 refuses to start

2014-04-01 Thread Marcin Cabaj
Strange,

could you paste here output of:

$ grep -n exec ./bin/cassandra


On Mon, Mar 31, 2014 at 8:05 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Is SELinux enabled?


 Nope! It's disabled.


 On Mon, Mar 31, 2014 at 2:50 PM, Michael Shuler mich...@pbandjelly.orgwrote:

 Is SELinux enabled?




 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




Finding cut-off points

2014-04-01 Thread Kasper Petersen
Hi,

I have a large amount (can be 100 million) of (id uuid, score int) entries
in Cassandra. I need to, at regular intervals of lets say 30-60 minutes,
find the cut-off points for the score needed to be in the top 0.1%, 33% and
66% of all scores.

What would a good approach be to this problem?

All the data wont fit into memory thus using regular sorting on the
application side won't be possible (unless I do it using a merge sort
algorithm with files, which feels like a bad solution).

Iterating over the data once and build a histogram would cut down the
required memory usage quite significantly, but I'm afraid this could still
end up being too big. Are there any easier ways to do these computations?

Lastly I've thought about the possibility to use analytics tools to compute
these things for me - would setting up hadoop and/or pig help me do this in
a manner that could make the results accessible to the application servers
once done? I've had a hard time finding any guides on how to set it up and
what exactly I'd be able to do with it afterwards. Any pointers would be
much appreciated.


Best regards,
Kasper


Dead node appearing in datastax driver

2014-04-01 Thread Apoorva Gaurav
Hello All,

We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1, host2,
host3 and host4), out of which we've removed one node (host4) using
nodetool removenode command. Now using nodetool status or nodetool ring we
no longer see host4. It's also not appearing in Datastax opscenter. But its
intermittently appearing in Metadata.getAllHosts() while connecting using
datastax driver 1.0.4.

Couple of questions :-
-How is it appearing.
-Can this have impact on read / write performance of client.

Code which we are using to connect is

 public void connect() {

PoolingOptions poolingOptions = new PoolingOptions();

cluster = Cluster.builder()

.addContactPoints(inetAddresses.toArray(new String[]{}))

.withLoadBalancingPolicy(new RoundRobinPolicy())

.withPoolingOptions(poolingOptions)

.withPort(port)

.withCredentials(username, password)

.build();

Metadata metadata = cluster.getMetadata();

System.out.printf(Connected to cluster: %s\n,
metadata.getClusterName());

for (Host host : metadata.getAllHosts()) {

System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n,
host.getDatacenter(), host.getAddress(), host.getRack());

}

}



-- 
Thanks  Regards,
Apoorva


Re: Dead node appearing in datastax driver

2014-04-01 Thread Sylvain Lebresne
On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:

 Hello All,

 We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1, host2,
 host3 and host4), out of which we've removed one node (host4) using
 nodetool removenode command. Now using nodetool status or nodetool ring we
 no longer see host4. It's also not appearing in Datastax opscenter. But its
 intermittently appearing in Metadata.getAllHosts() while connecting using
 datastax driver 1.0.4.

 Couple of questions :-
 -How is it appearing.


Not sure. Can you try querying the peers system table on each of your nodes
(with cqlsh: SELECT * FROM system.peers) and see if the host4 is still
mentioned somewhere?


 -Can this have impact on read / write performance of client.


No. If the host doesn't exists, the driver might try to reconnect to it at
times, but since it won't be able to, it won't try to use it for reads and
writes. That does mean you might have a reconnection task running with some
regularity, but 1) it's not on the write/read path of queries and 2)
provided you've left the default reconnection policy, this will happen once
every 10 minutes and will be pretty cheap so that it will consume an
completely negligible amount of ressources. That doesn't mean I'm not
interested tracking down why that happens in the first place though.

--
Sylvain




 Code which we are using to connect is

  public void connect() {

 PoolingOptions poolingOptions = new PoolingOptions();

 cluster = Cluster.builder()

 .addContactPoints(inetAddresses.toArray(new String[]{}))

 .withLoadBalancingPolicy(new RoundRobinPolicy())

 .withPoolingOptions(poolingOptions)

 .withPort(port)

 .withCredentials(username, password)

 .build();

 Metadata metadata = cluster.getMetadata();

 System.out.printf(Connected to cluster: %s\n,
 metadata.getClusterName());

 for (Host host : metadata.getAllHosts()) {

 System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n,
 host.getDatacenter(), host.getAddress(), host.getRack());

 }

 }



 --
 Thanks  Regards,
 Apoorva



Re: Dead node appearing in datastax driver

2014-04-01 Thread Apoorva Gaurav
Hello Sylvian,

Queried system.peers on three live nodes and host4 is appearing on two of
these.


On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav apoorva.gau...@myntra.com
  wrote:

 Hello All,

 We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1, host2,
 host3 and host4), out of which we've removed one node (host4) using
 nodetool removenode command. Now using nodetool status or nodetool ring we
 no longer see host4. It's also not appearing in Datastax opscenter. But its
 intermittently appearing in Metadata.getAllHosts() while connecting using
 datastax driver 1.0.4.

 Couple of questions :-
 -How is it appearing.


 Not sure. Can you try querying the peers system table on each of your
 nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is
 still mentioned somewhere?


 -Can this have impact on read / write performance of client.


 No. If the host doesn't exists, the driver might try to reconnect to it at
 times, but since it won't be able to, it won't try to use it for reads and
 writes. That does mean you might have a reconnection task running with some
 regularity, but 1) it's not on the write/read path of queries and 2)
 provided you've left the default reconnection policy, this will happen once
 every 10 minutes and will be pretty cheap so that it will consume an
 completely negligible amount of ressources. That doesn't mean I'm not
 interested tracking down why that happens in the first place though.

 --
 Sylvain




 Code which we are using to connect is

  public void connect() {

 PoolingOptions poolingOptions = new PoolingOptions();

 cluster = Cluster.builder()

 .addContactPoints(inetAddresses.toArray(new String[]{}))

 .withLoadBalancingPolicy(new RoundRobinPolicy())

 .withPoolingOptions(poolingOptions)

 .withPort(port)

 .withCredentials(username, password)

 .build();

 Metadata metadata = cluster.getMetadata();

 System.out.printf(Connected to cluster: %s\n,
 metadata.getClusterName());

 for (Host host : metadata.getAllHosts()) {

 System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n,
 host.getDatacenter(), host.getAddress(), host.getRack());

 }

 }



 --
 Thanks  Regards,
 Apoorva





-- 
Thanks  Regards,
Apoorva


Re: Dead node appearing in datastax driver

2014-04-01 Thread Apoorva Gaurav
Did that and I actually see a significant reduction in write latency.


On Tue, Apr 1, 2014 at 5:35 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.comwrote:

 Hello Sylvian,

 Queried system.peers on three live nodes and host4 is appearing on two of
 these.


 That's why the driver thinks they are still there. You're most probably
 running into https://issues.apache.org/jira/browse/CASSANDRA-6053 since
 you are on C* 2.0.4. As said, this is relatively harmless, but you should
 think about upgrading to 2.0.6 to fix it in the future (you could manually
 remove the bad entries in System.peers in the meantime if you want, they
 are really just leftover that shouldn't be here).

 --
 Sylvain



 On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:

 Hello All,

 We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1, host2,
 host3 and host4), out of which we've removed one node (host4) using
 nodetool removenode command. Now using nodetool status or nodetool ring we
 no longer see host4. It's also not appearing in Datastax opscenter. But its
 intermittently appearing in Metadata.getAllHosts() while connecting using
 datastax driver 1.0.4.

 Couple of questions :-
 -How is it appearing.


 Not sure. Can you try querying the peers system table on each of your
 nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is
 still mentioned somewhere?


 -Can this have impact on read / write performance of client.


 No. If the host doesn't exists, the driver might try to reconnect to it
 at times, but since it won't be able to, it won't try to use it for reads
 and writes. That does mean you might have a reconnection task running with
 some regularity, but 1) it's not on the write/read path of queries and 2)
 provided you've left the default reconnection policy, this will happen once
 every 10 minutes and will be pretty cheap so that it will consume an
 completely negligible amount of ressources. That doesn't mean I'm not
 interested tracking down why that happens in the first place though.

 --
 Sylvain




 Code which we are using to connect is

  public void connect() {

 PoolingOptions poolingOptions = new PoolingOptions();

 cluster = Cluster.builder()

 .addContactPoints(inetAddresses.toArray(newString[]{}))

 .withLoadBalancingPolicy(new RoundRobinPolicy())

 .withPoolingOptions(poolingOptions)

 .withPort(port)

 .withCredentials(username, password)

 .build();

 Metadata metadata = cluster.getMetadata();

 System.out.printf(Connected to cluster: %s\n,
 metadata.getClusterName());

 for (Host host : metadata.getAllHosts()) {

 System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n,
 host.getDatacenter(), host.getAddress(), host.getRack());

 }

 }



 --
 Thanks  Regards,
 Apoorva





 --
 Thanks  Regards,
 Apoorva





-- 
Thanks  Regards,
Apoorva


Specifying startBefore with iterators with compositeKeys

2014-04-01 Thread Brian Tarbox
I have a composite key consisting of: (integer, bytes) and I have rows like:
(1,abc), (1,def), (2,abc), (2,def) and I want to find all rows with the
integer part = 2.

I need to create a startBeyondName using CompositeType.Builder class and am
wondering if specifying (2, Bytes.Empty) will sort correctly?

I think another way of saying this is: does HeapByteBuffer with
pos=,lim=0,cap=0 sort prior to any other possible HeapByteBuffer?

Thanks.


Re: Dead node appearing in datastax driver

2014-04-01 Thread Sylvain Lebresne
What does Did that mean? Does that means I upgraded to 2.0.6, or does
that mean I manually removed entries from System.peers. If the latter,
I'd need more info on what you did exactly, what your peers table looked
like before and how they look like now: there is no reason deleting the
peers entries for hosts that at not part of the cluster anymore would have
anything to do with write latency (but if say you've removed wrong entries,
that might have make the driver think some live host had been removed and
if the drivers has less nodes to use to dispatch queries, that might impact
latency I suppose -- at least that's the only related thing I can think of).

--
Sylvain


On Tue, Apr 1, 2014 at 2:44 PM, Apoorva Gaurav apoorva.gau...@myntra.comwrote:

 Did that and I actually see a significant reduction in write latency.


 On Tue, Apr 1, 2014 at 5:35 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gaurav apoorva.gau...@myntra.com
  wrote:

 Hello Sylvian,

 Queried system.peers on three live nodes and host4 is appearing on two
 of these.


 That's why the driver thinks they are still there. You're most probably
 running into https://issues.apache.org/jira/browse/CASSANDRA-6053 since
 you are on C* 2.0.4. As said, this is relatively harmless, but you should
 think about upgrading to 2.0.6 to fix it in the future (you could manually
 remove the bad entries in System.peers in the meantime if you want, they
 are really just leftover that shouldn't be here).

 --
 Sylvain



 On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:

 Hello All,

 We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1,
 host2, host3 and host4), out of which we've removed one node (host4) using
 nodetool removenode command. Now using nodetool status or nodetool ring we
 no longer see host4. It's also not appearing in Datastax opscenter. But 
 its
 intermittently appearing in Metadata.getAllHosts() while connecting using
 datastax driver 1.0.4.

 Couple of questions :-
 -How is it appearing.


 Not sure. Can you try querying the peers system table on each of your
 nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is
 still mentioned somewhere?


 -Can this have impact on read / write performance of client.


 No. If the host doesn't exists, the driver might try to reconnect to it
 at times, but since it won't be able to, it won't try to use it for reads
 and writes. That does mean you might have a reconnection task running with
 some regularity, but 1) it's not on the write/read path of queries and 2)
 provided you've left the default reconnection policy, this will happen once
 every 10 minutes and will be pretty cheap so that it will consume an
 completely negligible amount of ressources. That doesn't mean I'm not
 interested tracking down why that happens in the first place though.

 --
 Sylvain




 Code which we are using to connect is

  public void connect() {

 PoolingOptions poolingOptions = new PoolingOptions();

 cluster = Cluster.builder()

 .addContactPoints(inetAddresses.toArray(newString[]{}))

 .withLoadBalancingPolicy(new RoundRobinPolicy())

 .withPoolingOptions(poolingOptions)

 .withPort(port)

 .withCredentials(username, password)

 .build();

 Metadata metadata = cluster.getMetadata();

 System.out.printf(Connected to cluster: %s\n,
 metadata.getClusterName());

 for (Host host : metadata.getAllHosts()) {

 System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n,
 host.getDatacenter(), host.getAddress(), host.getRack());

 }

 }



 --
 Thanks  Regards,
 Apoorva





 --
 Thanks  Regards,
 Apoorva





 --
 Thanks  Regards,
 Apoorva



Re: Finding cut-off points

2014-04-01 Thread Steven A Robenalt
Hi Kasper,

I'd suggest taking a look at Spark, Storm, or Samza (all are Apache
projects) for a possible approach. Depending on your needs and your
existing infrastructure, one of those may work better than others for you.

Steve





On Tue, Apr 1, 2014 at 2:51 AM, Kasper Petersen kas...@sybogames.comwrote:

 Hi,

 I have a large amount (can be 100 million) of (id uuid, score int)
 entries in Cassandra. I need to, at regular intervals of lets say 30-60
 minutes, find the cut-off points for the score needed to be in the top
 0.1%, 33% and 66% of all scores.

 What would a good approach be to this problem?

 All the data wont fit into memory thus using regular sorting on the
 application side won't be possible (unless I do it using a merge sort
 algorithm with files, which feels like a bad solution).

 Iterating over the data once and build a histogram would cut down the
 required memory usage quite significantly, but I'm afraid this could still
 end up being too big. Are there any easier ways to do these computations?

 Lastly I've thought about the possibility to use analytics tools to
 compute these things for me - would setting up hadoop and/or pig help me do
 this in a manner that could make the results accessible to the application
 servers once done? I've had a hard time finding any guides on how to set it
 up and what exactly I'd be able to do with it afterwards. Any pointers
 would be much appreciated.


 Best regards,
 Kasper




-- 
Steve Robenalt
Software Architect
HighWire | Stanford University
425 Broadway St, Redwood City, CA 94063

srobe...@stanford.edu
http://highwire.stanford.edu


Re: Timeuuid inserted with now(), how to get the value back in Java client?

2014-04-01 Thread Theo Hultberg
no, there's no way. you should generate the TIMEUUID on the client side so
that you have it.

T#


On Sat, Mar 29, 2014 at 1:01 AM, Andy Atj2 andya...@gmail.com wrote:

 I'm writing a Java client to a Cassandra db.

 One of the main primary keys is a timeuuid.

 I plan to do INSERTs using now() and have Cassandra generate the value of
 the timeuuid.

 After the INSERT, I need the Cassandra-generated timeuuid value. Is there
 an easy wsay to get it, without having to re-query for the record I just
 inserted, hoping to get only one record back? Remember, I don't have the PK.

 Eg, in every other db there's a way to get the generated PK back. In sql
 it's @@identity, in oracle its...etc etc.

 I know Cassandra is not an RDBMS. All I want is the value Cassandra just
 generated.

 Thanks,
 Andy




Re: Timeuuid inserted with now(), how to get the value back in Java client?

2014-04-01 Thread Vivek Mishra
You would get UUID object from cassandra API. Then you may use
uuid.timestamp() to get time stamp for the same

-Vivek


On Tue, Apr 1, 2014 at 9:55 PM, Theo Hultberg t...@iconara.net wrote:

 no, there's no way. you should generate the TIMEUUID on the client side so
 that you have it.

 T#


 On Sat, Mar 29, 2014 at 1:01 AM, Andy Atj2 andya...@gmail.com wrote:

 I'm writing a Java client to a Cassandra db.

 One of the main primary keys is a timeuuid.

 I plan to do INSERTs using now() and have Cassandra generate the value of
 the timeuuid.

 After the INSERT, I need the Cassandra-generated timeuuid value. Is there
 an easy wsay to get it, without having to re-query for the record I just
 inserted, hoping to get only one record back? Remember, I don't have the PK.

 Eg, in every other db there's a way to get the generated PK back. In sql
 it's @@identity, in oracle its...etc etc.

 I know Cassandra is not an RDBMS. All I want is the value Cassandra just
 generated.

 Thanks,
 Andy





Change IP address for all nodes

2014-04-01 Thread Christof Roduner

Hi,

Due to some network renumbering, we will need to change the IP addresses 
(and networks) of all nodes in our cluster.


Will the following procedure work, or is there anything special we'll 
need to consider?


  1. shut down all nodes
  2. update listen_address, rpc_address, seeds in cassandra.yaml
  3. update cassandra-topology.properties
  4. start all nodes

I'm asking because I remember some older reports (from 2011) about 
possible issues.


We are currently on 1.2.x.

Thanks,
Christof


Re: Change IP address for all nodes

2014-04-01 Thread Robert Coli
On Tue, Apr 1, 2014 at 12:53 PM, Christof Roduner chris...@scandit.comwrote:

 Will the following procedure work, or is there anything special we'll need
 to consider?


If you do not use auto_bootstrap:false, this will not work.

https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

Be careful, within some versions of 1.2.x there is an issue with using
auto_bootstrap which can result in broken node state in gossip. Do not do
this procedure in one of those versions! If you have trouble finding it in
JIRA/CHANGES.txt, let me know and I can try to dig it up.

=Rob


Re: Read performance in map data type

2014-04-01 Thread Robert Coli
 On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:

 Thanks Robert, Is there a workaround, as in our test setups we keep
 dropping and recreating tables.


Use unique keyspace (or table) names for each test? That's the approach
they're taking in 5202...

=Rob


Re: Dead node appearing in datastax driver

2014-04-01 Thread Apoorva Gaurav
I manually removed entries from System.peers.

The improvements can well be coincidental as various other apps were also
running on the same test bed.


On Tue, Apr 1, 2014 at 8:43 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 What does Did that mean? Does that means I upgraded to 2.0.6, or does
 that mean I manually removed entries from System.peers. If the latter,
 I'd need more info on what you did exactly, what your peers table looked
 like before and how they look like now: there is no reason deleting the
 peers entries for hosts that at not part of the cluster anymore would have
 anything to do with write latency (but if say you've removed wrong entries,
 that might have make the driver think some live host had been removed and
 if the drivers has less nodes to use to dispatch queries, that might impact
 latency I suppose -- at least that's the only related thing I can think of).

 --
 Sylvain


 On Tue, Apr 1, 2014 at 2:44 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.comwrote:

 Did that and I actually see a significant reduction in write latency.


 On Tue, Apr 1, 2014 at 5:35 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 1:49 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:

 Hello Sylvian,

 Queried system.peers on three live nodes and host4 is appearing on two
 of these.


 That's why the driver thinks they are still there. You're most probably
 running into https://issues.apache.org/jira/browse/CASSANDRA-6053 since
 you are on C* 2.0.4. As said, this is relatively harmless, but you should
 think about upgrading to 2.0.6 to fix it in the future (you could manually
 remove the bad entries in System.peers in the meantime if you want, they
 are really just leftover that shouldn't be here).

 --
 Sylvain



 On Tue, Apr 1, 2014 at 5:06 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Tue, Apr 1, 2014 at 12:50 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:

 Hello All,

 We had a 4 node cassandra 2.0.4 cluster  ( lets call them host1,
 host2, host3 and host4), out of which we've removed one node (host4) 
 using
 nodetool removenode command. Now using nodetool status or nodetool ring 
 we
 no longer see host4. It's also not appearing in Datastax opscenter. But 
 its
 intermittently appearing in Metadata.getAllHosts() while connecting using
 datastax driver 1.0.4.

 Couple of questions :-
 -How is it appearing.


 Not sure. Can you try querying the peers system table on each of your
 nodes (with cqlsh: SELECT * FROM system.peers) and see if the host4 is
 still mentioned somewhere?


 -Can this have impact on read / write performance of client.


 No. If the host doesn't exists, the driver might try to reconnect to
 it at times, but since it won't be able to, it won't try to use it for
 reads and writes. That does mean you might have a reconnection task 
 running
 with some regularity, but 1) it's not on the write/read path of queries 
 and
 2) provided you've left the default reconnection policy, this will happen
 once every 10 minutes and will be pretty cheap so that it will consume an
 completely negligible amount of ressources. That doesn't mean I'm not
 interested tracking down why that happens in the first place though.

 --
 Sylvain




 Code which we are using to connect is

  public void connect() {

 PoolingOptions poolingOptions = new PoolingOptions();

 cluster = Cluster.builder()

 .addContactPoints(inetAddresses.toArray(newString[]{}))

 .withLoadBalancingPolicy(new RoundRobinPolicy())

 .withPoolingOptions(poolingOptions)

 .withPort(port)

 .withCredentials(username, password)

 .build();

 Metadata metadata = cluster.getMetadata();

 System.out.printf(Connected to cluster: %s\n,
 metadata.getClusterName());

 for (Host host : metadata.getAllHosts()) {

 System.out.printf(Datacenter: %s; Host: %s; Rack: %s\n,
 host.getDatacenter(), host.getAddress(), host.getRack());

 }

 }



 --
 Thanks  Regards,
 Apoorva





 --
 Thanks  Regards,
 Apoorva





 --
 Thanks  Regards,
 Apoorva





-- 
Thanks  Regards,
Apoorva


Re: Opscenter help?

2014-04-01 Thread Patricia Gorla
- Were you running a mixed EL5/EL6 environment?
- What exact version of Cassandra were you upgrading from?

As for SE, you have to amass a bit of their magic points before you can
start to take more actions (tagging, down-voting, etc). Surprisingly good
at keeping down spam.
--
Patricia Gorla
@patriciagorla

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com http://thelastpickle.com


Drop in node replacements.

2014-04-01 Thread Redmumba
Is it possible to have true drop in node replacements?  For example, I
have a cluster of 51 Cassandra nodes, 17 in each data center.  I had one
host go down on DC3, and when it came back up, it joined the ring, etc.,
but was not receiving any data.  Even after multiple restarts and forcing a
repair on the entire fleet, it still holds maybe ~30MB on a cluster that is
absorbing ~1.2TB a day.

On top of that, I decided to see if I could recreate it--by taking down a
node, reprovisioning it, and then throwing it back in WITHOUT having it
take over the old node's tokens, it never seems to ever absorb any of the
old data after a full repair, and it never seems to start loading new data
(I now have 3 nodes using ~30MB).

Am I doing something wrong?  I would imagine a repair on the entire cluster
(across 3 DCs) would force C* to put some copies onto the other node--but
this doesn't seem to be the case.  What can I do?

Andrew


Re: Specifying startBefore with iterators with compositeKeys

2014-04-01 Thread Tyler Hobbs
On Tue, Apr 1, 2014 at 9:53 AM, Brian Tarbox tar...@cabotresearch.comwrote:

 I think another way of saying this is: does HeapByteBuffer with
 pos=,lim=0,cap=0 sort prior to any other possible HeapByteBuffer?


Yes.  However, if you use it as a slice finish, an empty ByteBuffer is
greater than any other value.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Row cache for writes

2014-04-01 Thread Tyler Hobbs
On Mon, Mar 31, 2014 at 11:37 AM, Wayne Schroeder 
wschroe...@pinsightmedia.com wrote:

 I found a lot of documentation about the read path for key and row caches,
 but I haven't found anything in regard to the write path.  My app has the
 need to record a large quantity of very short lived temporal data that will
 expire within seconds and only have a small percentage of the rows accessed
 before they expire.  Ideally, and I have done the math, I would like the
 data to never hit disk and just stay in memory once written until it
 expires.  How might I accomplish this?


It's not perfect, but set a short TTL on the data and set gc_grace_seconds
to 0 for the table.  Tombstones will still be written to disk, but almost
everything will be discarded in its first compaction.  You could also lower
the min compaction threshold for size-tiered compaction to 2 to force
compactions to happen more quickly.


  I am not concerned about data consistency at all on this so if I could
 even avoid the commit log, that would be even better.


You can set durable_writes = false for the keyspace.



 My main concern is that I don't see any evidence that writes end up in the
 cache--that it takes at least one read to get it into the cache.  I also
 realize that, assuming I don't cause SSTable writes due to sheer quantity,
 that the data would be in memory anyway.

 Has anyone done anything similar to this that could provide direction?


Writes invalidate row cache entries, so that's not what you want.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Drop in node replacements.

2014-04-01 Thread Robert Coli
On Tue, Apr 1, 2014 at 3:24 PM, Redmumba redmu...@gmail.com wrote:

 Is it possible to have true drop in node replacements?  For example, I
 have a cluster of 51 Cassandra nodes, 17 in each data center.  I had one
 host go down on DC3, and when it came back up, it joined the ring, etc.,
 but was not receiving any data.  Even after multiple restarts and forcing a
 repair on the entire fleet, it still holds maybe ~30MB on a cluster that is
 absorbing ~1.2TB a day.


What version of Cassandra? Real hardware/network or virtual?

=Rob


Re: Read performance in map data type

2014-04-01 Thread Apoorva Gaurav
Thanks Sourabh,

I've modelled my table as studentID int, subjectID int, marks int, PRIMARY
KEY(studentID, subjectID) as primarily I'll be querying using studentID
and sometime using studentID and subjectID.

I've tried driver 2.0.0 and its giving good results. Also using its auto
paging feature. Any idea what should be a typical value for fetch size. And
does the fetch size depends on how many columns are there in the CQL table
for e.g. should fetch size in a table like studentID int, subjectID int,
marks1 int, marks2 int, marks3 int marksN int PRIMARY KEY(studentID,
subjectID) be less than fetch size in studentID int, subjectID int, marks
int, PRIMARY KEY(studentID, subjectID)


On Wed, Apr 2, 2014 at 2:20 AM, Robert Coli rc...@eventbrite.com wrote:

  On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:

 Thanks Robert, Is there a workaround, as in our test setups we keep
 dropping and recreating tables.


 Use unique keyspace (or table) names for each test? That's the approach
 they're taking in 5202...

 =Rob




-- 
Thanks  Regards,
Apoorva


Re: Read performance in map data type

2014-04-01 Thread Sourabh Agrawal
From the doc : The fetch size controls how much resulting rows will be
retrieved simultaneously.
So, I guess it does not depend on the number of columns as such. As all the
columns for a key reside on the same node, I think it wouldn't matter much
whatever be the number of columns as long as we have enough memory in the
app.

Default value is 5000. (com.datastax.driver.core.QueryOptions)

We use it with the default value. I have never profiled cassandra for read
load. If you profile it for different fetch sizes, please share the results
:)


On Wed, Apr 2, 2014 at 8:45 AM, Apoorva Gaurav apoorva.gau...@myntra.comwrote:

 Thanks Sourabh,

 I've modelled my table as studentID int, subjectID int, marks int,
 PRIMARY KEY(studentID, subjectID) as primarily I'll be querying using
 studentID and sometime using studentID and subjectID.

 I've tried driver 2.0.0 and its giving good results. Also using its auto
 paging feature. Any idea what should be a typical value for fetch size. And
 does the fetch size depends on how many columns are there in the CQL table
 for e.g. should fetch size in a table like studentID int, subjectID int,
 marks1 int, marks2 int, marks3 int marksN int PRIMARY KEY(studentID,
 subjectID) be less than fetch size in studentID int, subjectID int,
 marks int, PRIMARY KEY(studentID, subjectID)


 On Wed, Apr 2, 2014 at 2:20 AM, Robert Coli rc...@eventbrite.com wrote:

  On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:

 Thanks Robert, Is there a workaround, as in our test setups we keep
 dropping and recreating tables.


 Use unique keyspace (or table) names for each test? That's the approach
 they're taking in 5202...

 =Rob




 --
 Thanks  Regards,
 Apoorva




-- 
Sourabh Agrawal
Bangalore
+91 9945657973