Re: Reasonable range for the max number of tables?

2014-08-05 Thread Phil Luckhurst
Is there any mention of this limitation anywhere in the Cassandra
documentation? I don't see it mentioned in the 'Anti-patterns in Cassandra'
section of the DataStax 2.0 documentation or anywhere else.

When starting out with Cassandra as a store for a multi-tenant application
it seems very attractive to segregate data for each tenant using a tenant
specific keyspace each with their own set of tables. It's not until you
start browsing through forums such as this that you find out that it isn't
going to scale above a few tenants.

If you want to be able to segregate customer data in Cassandra is it the
accepted practice to have multiple Cassandra installations?



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596106.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Reasonable range for the max number of tables?

2014-08-05 Thread Phil Luckhurst
Hi Mark,

Mark Reddy wrote
 To segregate customer data, you could:
 - Use customer specific column families under a single keyspace
 - Use a keyspace per customer

These effectively amount to the same thing and they both fall foul to the
limit in the number of column families so do not scale.


Mark Reddy wrote
 - Use the same column families and have a column that identifies the
 customer. On the application layer ensure that there are sufficient checks
 so one customer can't read another customers data

And while this gets around the column family limit it does not allow the
same level of data segregation. For example with a separate keyspace or
column families it is trivial to remove a single customer's data or move
that data to another system. With one set of column families for all
customers these types of actions become much more difficult as any change
impacts all customers but perhaps that's the price we have to pay to scale.

And I still think this needs to be made more prominent in the documentation.

Thanks
Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596119.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: RPC timeout paging secondary index query results

2014-07-02 Thread Phil Luckhurst
Ken Hancock wrote
 You didn't post any timings, only when it started failing so it's unclear
 whether performance is dropping off or scaling in some sort of linear or
 non-linear fashion. Second the recommendation to do some traces which
 should be much more telling.

I'm afraid I've not yet had time to pursue this any further. I take your
point about the traces but the fact performance drops off so quickly to the
point where we can't complete any queries means that in this case secondary
indexes as they stand are not going to work for us so we are going to have
to rework our data model to avoid them.



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078p7595486.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: RPC timeout paging secondary index query results

2014-06-13 Thread Phil Luckhurst
But would you expect performance to drop off so quickly? At 250,000 records
we can still page through the query with LIMIT 5 but when adding an
additional 50,000 records we can't page past the first 10,000 records even
if we drop to LIMIT 10.

What about the case where we add 100,000 records for each indexed value?
When we do this for 2 values, i.e. 200,000 records with 2 indexed values, we
can query all 100,000 records for one of the values using LIMIT 10. If
we add a third indexed value with another 100,000 records then we can't page
through any of the indexed values even though the original 2 that worked
previously have not changed.

Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078p7595126.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: RPC timeout paging secondary index query results

2014-06-12 Thread Phil Luckhurst
The problem appears to be directly related to number of entries in the index.
I started with an empty table and added 50,000 entries at a time with the
same indexed value. I was able to page through the results of a query that
used the secondary index with 250,000 records in the table using a LIMIT
5 clause. When I added another 50,000 to take it up to 300,000 entries I
started getting the RPC timeout and could not page past the first 10,000
records even if I dropped the LIMIT to 10 records at a time.

I then changed the test slightly and added two batches of 100,000 records
with two different indexed values. With 200,000 records in total I could
query all 100,000 records for one of the indexed values with a single query
with LIMIT 10. However, as soon as I added a third batch of 100,000
records with a different indexed value the queries again started to timeout
so that I couldn't page through the results even with a smaller LIMIT size.

It appears that in our configuration a secondary index can start to fail
when it reaches approx 300,000 entries. I'll open a JIRA.



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078p7595110.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RPC timeout paging secondary index query results

2014-06-11 Thread Phil Luckhurst
Is paging through the results of a secondary index query broken in Cassandra
2.0.7 or are we doing something wrong?

We have table with a few hundred thousand records and an indexed
low-cardinality column. The relevant bits of the table definition are shown
below

CREATE TABLE measurement (
measurement_id uuid,
controller_id uuid,
...
PRIMARY KEY (measurement_id)
);

CREATE INDEX ON measurement(controller_id);

We originally got the timeout when trying to page through the results of a
'SELECT * FROM measurement WHERE controller_id = xxx-xxx-xxx' query using
the java driver 2.0.2 but we can also consistently reproduce the problem
with CQLSH.

In CQLSH we can start paging through the measurement_id entries a 1000 at a
time for a specific controller_id by using the token() method, e.g.

SELECT measurement_id, token(measurement_id) FROM measurement WHERE
controller_id = 0167bfa6-0918-47ba-8b65-dcccecbcd79f AND
token(measurement_id) = -8975013189301561463 LIMIT 1000;

This works for 8 queries but consistently fails with an RPC timeout for rows
8000-9000. If from row 8000 we start using a smaller LIMIT size we can get
to approx row 8950 but at that point we get the timeout even if we set
'LIMIT 10'. Looking at the trace output it seems to be seems to be doing
thousands of queries on the index table for every request even if we set
'LIMIT 1' - almost as if it's starting from the beginning of the index for
each page request?

It all seems very similar to  CASSANDRA-5975
https://issues.apache.org/jira/browse/CASSANDRA-5975   but that is marked
as resolved in Cassandra 2.0.1. For example this query for a single record

SELECT measurement_id, token(measurement_id) FROM measurement WHERE
controller_id = 0167bfa6-0918-47ba-8b65-dcccecbcd79f AND
token(measurement_id) = -8947401969768490998;

works fine and produces approx 60 lines of trace output. If we simply add
'LIMIT 1' to the statement the trace output is approx 70,000 lines!

It looks like we may have to give up on using secondary indexes but it would
be nice to know if what we are trying to do is correct and should work.

Thanks
Phil









--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: RPC timeout paging secondary index query results

2014-06-11 Thread Phil Luckhurst
Thanks Rob.

I understand that we will probably end up either creating our own index or
duplicating the data and we have done that to remove a reliance on secondary
indexes in other places. It just seems that what we are trying to do here is
such basic functionality of an index that I thought we must be doing
something wrong for it to appear to be this broken.

Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RPC-timeout-paging-secondary-index-query-results-tp7595078p7595092.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Change number of vnodes on an existing cluster

2014-05-23 Thread Phil Luckhurst
Thank you Rob, that's all really useful information. As our production
cluster is going to grow over time it looks like we may need to stick with
vnodes (but maybe not 256) and as you say hope the work to improve their
inefficiencies progresses quite quickly.

 In real reality, vnodes were almost certainly set to default in 2.0 so
 that they could be hardened by both early adopters (cassandra experts) and
 noobs (cassandra anti-experts) encountering problems with this default. As
 Eric Evans mentioned at the time in a related post, this is to some extent
 the reality of open source software development. Where he and I appear to
 disagree is on whether it is reasonable to set new features as defaults
 and thereby use noobs as part of your QA process for same.

I totally agree with you here. As a 'noob' reading through the Cassandra
documentation before installing for the first time everything pushes you
toward using vnodes. It's not until you start reading in forums such as
this, blog posts, and JIRA issues that it becomes clear that they do come
with caveats.

I'll look forward to the 'always-specify-initial_token best practice' post.

Thanks,
Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Change-number-of-vnodes-on-an-existing-cluster-tp7594646p7594661.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-22 Thread Phil Luckhurst
Definitely no TTL and records are only written once with no deletions.

Phil


DuyHai Doan wrote
 Are you sure there is no TTL set on your data? It might explain the shrink
 in sstable size after compaction.





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594644.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Change number of vnodes on an existing cluster

2014-05-22 Thread Phil Luckhurst
We have a 3 node 2.0.7 cluster with RF=3. At the moment these are configured
to have the default 256 vnodes we'd like to try reducing that to see what
effect it has on some of our CQL query times.

It seems from  CASSANDRA-7057
https://issues.apache.org/jira/browse/CASSANDRA-7057   that there is no
automatic method for this but is it possible to do this manually and if so
what are the steps required? Do we need to add new nodes with the number of
vnodes we require and then decommission the existing ones or is it possible
to do it just with our existing 3 nodes?

Thanks
Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Change-number-of-vnodes-on-an-existing-cluster-tp7594646.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Change number of vnodes on an existing cluster

2014-05-22 Thread Phil Luckhurst
Thanks Rob, I didn't realize that you could use the initial_token when using
vnodes.

I see what you mean now that with RF=N having multiple vnodes is not
actually achieving anything unless we add further nodes, we hadn't really
considered that when we initially installed with the default yaml file.

For a small cluster, e.g. 9 nodes with RF=3 would you actually recommend
using vnodes at all and if so how many?

Phil



Robert Coli-3 wrote
 On Thu, May 22, 2014 at 4:31 AM, Phil Luckhurst 

 phil.luckhurst@

 wrote:
 
 We have a 3 node 2.0.7 cluster with RF=3. At the moment these are
 configured
 to have the default 256 vnodes we'd like to try reducing that to see what
 effect it has on some of our CQL query times.

 
 Because you have RF=N, all nodes have all data. This means that it is
 actually irrelevant how many vnodes (or nodes) you have, you just get lose
 from using them at all.
 
 However to reduce the number of vnodes to a number that is ok in your case
 but also might be reasonable when you have RF != N, you can just :
 
 1) get a list of tokens per node via a one-liner like this :
 
 nodetool info -T | grep Token | awk '{print $3}' | paste -s -d,
 
 2) modify this list by removing however many tokens you want to get to the
 new number of vnodes
 
 3) insert this list into the initial_token line of cassandra.yaml on each
 node [1]
 
 4) rolling re-start nodes with auto_bootstrap:false [2]
 
 My *belief* is that you do not need a step 3.5 nuke the system keyspace
 and reload schema, potentially with the entire cluster down, but it's
 possible that other nodes may remember your old vnodes unless you do. Test
 in a non-production environment, obviously.
 
 If the above is too complicated and you have the spare hosts, adding 3 new
 nodes and then decommissioning the old ones is a safe and simple way to
 achieve the same goal.
 
 =Rob
 [1] Note that I recommend this as a best practice for the use of vnodes,
 always populate initial_token.
 [2]
 https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Change-number-of-vnodes-on-an-existing-cluster-tp7594646p7594657.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-22 Thread Phil Luckhurst
Hi Andreas,

So does that mean it can compact the 'hottest' partitions into a new sstable
but the old sstables may not immediately be removed so the same data could
be in more that one sstable? That would certainly explain the difference we
see when we manually run nodetool compact.

Thanks
Phil


Andreas Finke wrote
 Hi Phil,
 
 I found an interesting blog entry that may address your problem.
 
 http://www.datastax.com/dev/blog/optimizations-around-cold-sstables
 
 It seems that compaction is skipped for stables which so mit satisfy a
 certain read rate. Please check.
 
 
 Kind regards
 
 Andreas Finke
 Java Developer
 Solvians IT-Solutions GmbH
 
 
  Phil Luckhurst wrote 
 
 Definitely no TTL and records are only written once with no deletions.
 
 Phil
 
 
 DuyHai Doan wrote
 Are you sure there is no TTL set on your data? It might explain the
 shrink
 in sstable size after compaction.
 
 
 
 
 
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594644.html
 Sent from the 

 cassandra-user@.apache

  mailing list archive at Nabble.com.





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594658.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-21 Thread Phil Luckhurst
I'm wondering if the lack of response to this means it was a dumb question
however I've searched the documentation again but I still can't find an
answer :-(

Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594627.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-21 Thread Phil Luckhurst
We based the estimate on a previous controlled observation. We generated a
year's worth of one minute data for a single identifier and recorded the
size of the resulting sstable. By adding the data one month at a time we
observed that there was a linear predictable increase in the sstable size.
Using this we simply multiplied by the number of identifiers, in this case
700, to get the 7GB estimate.
And as noted above this estimate is correct once the data is compacted to
one sstable but is wrong when there are multiple sstables.

Phil


Andreas Finke wrote
 Hi Phil,
 
 there is no dump question ;) What is your size estimation based on e.g.
 what size is a column in your calculation?





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594641.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-19 Thread Phil Luckhurst
We have a table defined using SizeTieredCompactionStrategy that is used to
store time series data. Over a period of a few days we wrote approximately
200,000 unique time based entries for each of 700 identifiers, i.e. 700 wide
rows with 200,000 entries in each.  The table was empty when we started and
and there were no updates to any entries, no deletions, and no tombstones
were created.

Our estimates suggested that this should have required about 7GB of disk
space but when we looked on disk there were 8 sstables taking up 11GB of
space.

Running nodetool compact on the column family reduced it to a single sstable
that does match our 7GB estimate.

I'd like to understand what accounts for the other 4GB when it was stored as
multiple sstables? Is it because the individual sstables overlap?

Thanks
Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: binary protocol server side sockets

2014-04-11 Thread Phil Luckhurst
We are also seeing this in our development environment. We have a 3 node
Cassandra 2.0.5 cluster running on Ubuntu 12.04 and are connecting from a
Tomcat based application running on Windows using the 2.0.0 Cassandra Java
Driver. We have setKeepAlive(true) when building the cluster in the
application and this does keep one connection open on the client side to
each of the 3 Cassandra nodes, but we still see the build up of 'old'
ESTABLISHED connections on each of the Cassandra servers. 

We are also getting that same Unexpected exception during request
exception appearing in the logs

ERROR [Native-Transport-Requests:358378] 2014-04-09 12:31:46,824
ErrorMessage.java (line 222) Unexpected exception during request
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(Unknown Source)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
at sun.nio.ch.IOUtil.read(Unknown Source)
at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

Initially we thought this was down to a firewall that is between our
development machines and the Cassandra nodes but that has now been
configured not to 'kill' any connections on port 9042. We also have the
Windows firewall on the client side turned off.

We still think this is down to our environment as the same application
running in Tomcat hosted on a Ubuntu 12.04 server does not appear to be
doing this but up to now we can't track down the cause.




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/binary-protocol-server-side-sockets-tp7593879p7593937.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: binary protocol server side sockets

2014-04-11 Thread Phil Luckhurst
We have considered this but wondered how well it would work as the Cassandra
Java Driver opens multiple connections internally to each Cassandra node. I
suppose it depends how those connections are used internally, if it's round
robin then it should work. Perhaps we just need to to try it.

--
Thanks
Phil


Chris Lohfink wrote
 TCP keep alives (by the setTimeout) are notoriously useless...  The
 default
 2 hours is generally far longer then any timeout in NAT translation tables
 (generally ~5 min) and even if you decrease the keep alive to a sane value
 a log of networks actually throw away TCP keep alive packets.  You see
 that
 a lot more in cell networks though.  Its almost always a good idea to have
 a software keep alive although it seems to be not implemented in this
 protocol.  You can make a super simple CF with 1 value and query it every
 minute a connection is idle or something.  i.e. select * from DummyCF
 where id = 1
 
 -- 
 *Chris Lohfink*
 Engineer
 415.663.6738  |  Skype: clohfink.blackbirdit
 *Blackbird **[image: favicon]*
 
 775.345.3485  |  www.blackbirdIT.com lt;http://www.blackbirdit.com/gt;
 
 *Formerly PalominoDB/DriveDev*
 
 
 On Fri, Apr 11, 2014 at 3:04 AM, Phil Luckhurst 

 phil.luckhurst@

 wrote:
 
 We are also seeing this in our development environment. We have a 3 node
 Cassandra 2.0.5 cluster running on Ubuntu 12.04 and are connecting from a
 Tomcat based application running on Windows using the 2.0.0 Cassandra
 Java
 Driver. We have setKeepAlive(true) when building the cluster in the
 application and this does keep one connection open on the client side to
 each of the 3 Cassandra nodes, but we still see the build up of 'old'
 ESTABLISHED connections on each of the Cassandra servers.

 We are also getting that same Unexpected exception during request
 exception appearing in the logs

 ERROR [Native-Transport-Requests:358378] 2014-04-09 12:31:46,824
 ErrorMessage.java (line 222) Unexpected exception during request
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(Unknown Source)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
 at sun.nio.ch.IOUtil.read(Unknown Source)
 at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
 at
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
 at

 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
 at

 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
 at

 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
 at
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 Initially we thought this was down to a firewall that is between our
 development machines and the Cassandra nodes but that has now been
 configured not to 'kill' any connections on port 9042. We also have the
 Windows firewall on the client side turned off.

 We still think this is down to our environment as the same application
 running in Tomcat hosted on a Ubuntu 12.04 server does not appear to be
 doing this but up to now we can't track down the cause.




 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/binary-protocol-server-side-sockets-tp7593879p7593937.html
 Sent from the 

 cassandra-user@.apache

  mailing list archive at
 Nabble.com.

 
 
 
 -- 
 *Chris Lohfink*
 Engineer
 415.663.6738  |  Skype: clohfink.blackbirdit
 
 *Blackbird **[image: favicon]*
 
 775.345.3485  |  www.blackbirdIT.com lt;http://www.blackbirdit.com/gt;
 
 *Formerly PalominoDB/DriveDev*
 
 
 image001.png (5K)
 lt;http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/attachment/7593947/0/image001.pnggt;





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/binary-protocol-server-side-sockets-tp7593879p7593955.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Gossip intermittently marks node as DOWN

2014-03-19 Thread Phil Luckhurst
I think we've found the issue!

It seems that the times on those Cassandra servers was being kept in sync by
vmware tools using the time of the vmware host machine. We have now turned
that off and are using the ntp service to keep the times in sync like we do
for our physical servers and we have not seen the gossip failures for the
last 24 hours.

--
Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593569.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Gossip intermittently marks node as DOWN

2014-03-04 Thread Phil Luckhurst
The VMs are hosted on the same ESXi server and they are just running
Cassandra. We seem to get this happen even if the nodes appear to be idle;
about 2 to 4 times per hour.


Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Gossip intermittently marks node as DOWN

2014-03-04 Thread Phil Luckhurst
It was created with the default settings so we have 256 per node.


Fabrice Facorat wrote
 From what I understand, this can happen when having many nodes and
 vnodes by node. How many vnodes did you configure on your nodes ?
 
 2014-03-04 11:37 GMT+01:00 Phil Luckhurst lt;

 phil.luckhurst@

 gt;:
 The VMs are hosted on the same ESXi server and they are just running
 Cassandra. We seem to get this happen even if the nodes appear to be
 idle;
 about 2 to 4 times per hour.

 
 Phil



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html
 Sent from the 

 cassandra-user@.apache

  mailing list archive at Nabble.com.
 
 
 
 -- 
 Close the World, Open the Net
 http://www.linux-wizard.net





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593204.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Gossip intermittently marks node as DOWN

2014-03-04 Thread Phil Luckhurst
Here's the tpstats output from both nodes.






Johnny Miller wrote
 What is nodetool tpstats telling you?





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593206.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Gossip intermittently marks node as DOWN

2014-03-03 Thread Phil Luckhurst
We have a 2 node Cassandra 2.0.5 cluster running on a couple of VMWare hosted
virtual machines using Ubuntu 12.04 for testing. As you can see from the log
entries below the gossip connection between the nodes regularly goes DOWN
and UP. We saw on another post that increasing the phi_convict_threshold may
help with this so we increased that to '12' but we still get the same
problem. 

 INFO [GossipTasks:1] 2014-02-28 07:51:10,937 Gossiper.java (line 863)
InetAddress /10.150.100.20 is now DOWN 
 INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:51:10,951
OutboundTcpConnection.java (line 386) Handshaking version with
/10.150.100.20 
 INFO [RequestResponseStage:898] 2014-02-28 07:51:21,411 Gossiper.java (line
849) InetAddress /10.150.100.20 is now UP 
 INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:53:52,100
OutboundTcpConnection.java (line 386) Handshaking version with
/10.150.100.20 
 INFO [GossipTasks:1] 2014-02-28 08:06:52,956 Gossiper.java (line 863)
InetAddress /10.150.100.20 is now DOWN 
 INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:06:52,963
OutboundTcpConnection.java (line 386) Handshaking version with
/10.150.100.20 
 INFO [RequestResponseStage:915] 2014-02-28 08:07:21,447 Gossiper.java (line
849) InetAddress /10.150.100.20 is now UP 
 INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:14:09,613
OutboundTcpConnection.java (line 386) Handshaking version with
/10.150.100.20 

 Has anyone got any suggestions for fixing this? 
  
 Thanks 
 Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Invalid compacted_at timestamp entries in Cassandra 2.0.5

2014-03-03 Thread Phil Luckhurst
Running 'nodetool compactionHistory' seems to be showing strange timestamp
values for the 'compacted_at' column. e.g.
id  keyspace_name 
columnfamily_namecompacted_at  bytes_in  
bytes_out  rows_merged
cb035320-9f11-11e3-82e3-e37a59d03017 system sstable_activity

1212036306964769  74352  19197  {1:19, 4:427}

And running a CQL query on the system.compaction_history table shows dates
well in the future.

 id| bytes_in |
bytes_out | columnfamily_name   | compacted_at  |
keyspace_name  | rows_merged
--+--+---+-+---++--
 bda494f0-9db8-11e3-bb85-ed7074988754 |  647 |   320 |
compactions_in_progress | 17391-03-07 12:26:35+ | system |  
  
{1: 3, 2: 1}
 dc87cd00-a269-11e3-a1a8-ed7074988754 |  410 |   159 |
compactions_in_progress | 33738-09-07 17:03:21+0100 | system |  
  
{1: 2, 2: 1}

Is this a known issue or something wrong on our system?

Thanks
Phil




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Invalid-compacted-at-timestamp-entries-in-Cassandra-2-0-5-tp7593190.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Invalid compacted_at timestamp entries in Cassandra 2.0.5

2014-03-03 Thread Phil Luckhurst
Thanks.



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Invalid-compacted-at-timestamp-entries-in-Cassandra-2-0-5-tp7593190p7593192.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.