Re: High read latency after data volume increased

2015-01-08 Thread Roni Balthazar
Hi Robert,

We downgraded to 2.1.1, but got the very same result. The read latency is
still high, but we figured out that it happens only using a specific
keyspace.
Please see the graphs below...

​
Trying another keyspace with 600+ reads/sec, we are getting the acceptable
~30ms read latency.

Let me know if I need to provide more information.

Thanks,

Roni Balthazar

On Thu, Jan 8, 2015 at 5:23 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar ronibaltha...@gmail.com
 wrote:

 We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.


 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

 2.1.2 in particular is known to have significant issues. You'd be better
 off running 2.1.1 ...

 =Rob




C* throws OOM error despite use of automatic paging

2015-01-08 Thread Mohammed Guller
Hi -

We have an ETL application that reads all rows from Cassandra (2.1.2), filters 
them and stores a small subset in an RDBMS. Our application is using Datastax's 
Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver 
supports automatic paging, I was under the impression that SELECT queries 
should not cause an OOM error on the C* nodes. However, even with just 16GB 
data on each nodes, the C* nodes start throwing OOM error as soon as the 
application starts iterating through the rows of a table.

The application code looks something like this:

Statement stmt = new SimpleStatement(SELECT x,y,z FROM cf).setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
  row = rs.one()
  process(row)
}

Even after we reduced the page size to 1000, the C* nodes still crash. C* is 
running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap 
size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap 
usage climbs up to 7.6GB. That does not make sense. Either automatic paging is 
not working or we are missing something.

Does anybody have insights as to what could be happening? Thanks.

Mohammed




Re: High read latency after data volume increased

2015-01-08 Thread Robert Coli
On Thu, Jan 8, 2015 at 6:38 PM, Roni Balthazar ronibaltha...@gmail.com
wrote:

 We downgraded to 2.1.1, but got the very same result. The read latency is
 still high, but we figured out that it happens only using a specific
 keyspace.


Note that downgrading is officially unsupported, but is probably safe
between those two versions.

Enable tracing and paste results for a high latency query.

Also, how much RAM is used for heap?

=Rob


Re: Rebooted cassandra node timing out all requests but recovers after a while

2015-01-08 Thread Anand Somani
I will keep an eye for that if it happens again. Times at this point are
synchronized

On Wed, Jan 7, 2015 at 10:37 PM, Duncan Sands duncan.sa...@gmail.com
wrote:

 Hi Anand,


 On 08/01/15 02:02, Anand Somani wrote:

 Hi,

 We have a 3 node cluster (on VM). Eg. host1, host2, host3. One of the VM
 rebooted (host1) and when host1 came up it would see the others as down
 and the
 others (host2 and host3) see it as down. So we restarted host2 and now
 the ring
 seems fine(everybody sees everybody as up).

 But now the clients timeout talking to host1. Have not figured out what is
 causing it. There is nothing in the logs that indicates a problem.
 Looking for
 indicators/help on what debug/tracing to turn on to find out what could be
 causing it.

 Now this happens only when a VM reboots (not otherwise), also it seems to
 have
 recovered itself after some hours!!( or restarts) not sure which one.

 This is 1.2.15, we are using ssl and cassandra authorizers.


 perhaps time is not synchronized between the nodes to begin with, and
 eventually becomes synchronized.

 Ciao, Duncan.



User audit in Cassandra

2015-01-08 Thread Ajay
Hi,

Is there a way to enable user audit or trace if we have enabled
PasswordAuthenticator in cassandra.yaml and set up the users as well. I
noticed there are keyspaces system_auth and system_trace. But there is no
way to find out which user initiated which session. Is there anyway to find
out?. Also is it recommended to enable system_trace in production or to
know how many sessions started by a user?

Thanks
Ajay


Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread Robert Wille
After bootstrapping a node, the node repeatedly compacts the same tables over 
and over, even though my cluster is completely idle. I’ve noticed the same 
behavior after extended periods of heavy writes. I realize that during 
bootstrapping (or extended periods of heavy writes) that compaction could get 
seriously behind, but once a table has been compacted, I don’t see the need to 
recompact the table dozens of more times.

Possibly related, I often see that OpsCenter reports that nodes have a large 
number of pending tasks, when Pending column of the Thread Pool Stats doesn’t 
reflect that.

Robert



incremental repairs

2015-01-08 Thread Roland Etzenhammer

Hi,

I am currently trying to migrate my test cluster to incremental repairs. 
These are the steps I'm doing on every node:


- touch marker
- nodetool disableautocompation
- nodetool repair
- cassandra stop
- find all *Data*.db files older then marker
- invoke sstablerepairedset on those
- cassandra start

This is essentially what 
http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 says. 
After all nodes migrated this way, I think I need to run my regular 
repairs more often and they should be faster afterwards. But do I need 
to run nodetool repair or is nodetool repair -pr sufficient?


And do I need to reenable autocompation? Oder do I need to compact myself?

Thanks for any input,
Roland


Re: incremental repairs

2015-01-08 Thread Marcus Eriksson
If you are on 2.1.2+ (or using STCS) you don't those steps (should probably
update the blog post).

Now we keep separate levelings for the repaired/unrepaired data and move
the sstables over after the first incremental repair

But, if you are running 2.1 in production, I would recommend that you wait
until 2.1.3 is out, https://issues.apache.org/jira/browse/CASSANDRA-8316
fixes a bunch of issues with incremental repairs

-pr is sufficient, same rules apply as before, if you run -pr you need to
repair every node

/Marcus

On Thu, Jan 8, 2015 at 9:16 AM, Roland Etzenhammer 
r.etzenham...@t-online.de wrote:

 Hi,

 I am currently trying to migrate my test cluster to incremental repairs.
 These are the steps I'm doing on every node:

 - touch marker
 - nodetool disableautocompation
 - nodetool repair
 - cassandra stop
 - find all *Data*.db files older then marker
 - invoke sstablerepairedset on those
 - cassandra start

 This is essentially what http://www.datastax.com/dev/
 blog/anticompaction-in-cassandra-2-1 says. After all nodes migrated this
 way, I think I need to run my regular repairs more often and they should be
 faster afterwards. But do I need to run nodetool repair or is nodetool
 repair -pr sufficient?

 And do I need to reenable autocompation? Oder do I need to compact myself?

 Thanks for any input,
 Roland



Re: incremental repairs

2015-01-08 Thread Roland Etzenhammer

Hi Marcus,

thanks for that quick reply. I did also look at:

http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html

which describes the same process, it's 2.1.x, so I see that 2.1.2+ is 
not covered there. I did upgrade my testcluster to 2.1.2 and with your 
hint I take a look at sstablemetadata from a non migrated node and 
there are indeed Repaired at entries on some sstables already. So if I 
got this right, in 2.1.2+ there is nothing to do to switch to 
incremental repairs (apart from running the repairs themself).


But one thing I see during testing is that there are many sstables, with 
small size:


- in total there are 5521 sstables on one node
- 115 sstables are bigger than 1MB
- 4949 sstables are smaller than 10kB

I don't know where they came from - I found one piece of information 
where this happend when cassandra was low on heap which happend to me 
while running tests (the suggested solution is to trigger compaction via 
JMX).


Question for me: I did disable autocompaction on some nodes of our test 
cluster as the blog and docs said. Should/can I reenable autocompaction 
again with incremental repairs?


Cheers,
Roland





Re: incremental repairs

2015-01-08 Thread Marcus Eriksson
Yes, you should reenable autocompaction

/Marcus

On Thu, Jan 8, 2015 at 10:33 AM, Roland Etzenhammer 
r.etzenham...@t-online.de wrote:

 Hi Marcus,

 thanks for that quick reply. I did also look at:

 http://www.datastax.com/documentation/cassandra/2.1/
 cassandra/operations/ops_repair_nodes_c.html

 which describes the same process, it's 2.1.x, so I see that 2.1.2+ is not
 covered there. I did upgrade my testcluster to 2.1.2 and with your hint I
 take a look at sstablemetadata from a non migrated node and there are
 indeed Repaired at entries on some sstables already. So if I got this
 right, in 2.1.2+ there is nothing to do to switch to incremental repairs
 (apart from running the repairs themself).

 But one thing I see during testing is that there are many sstables, with
 small size:

 - in total there are 5521 sstables on one node
 - 115 sstables are bigger than 1MB
 - 4949 sstables are smaller than 10kB

 I don't know where they came from - I found one piece of information where
 this happend when cassandra was low on heap which happend to me while
 running tests (the suggested solution is to trigger compaction via JMX).

 Question for me: I did disable autocompaction on some nodes of our test
 cluster as the blog and docs said. Should/can I reenable autocompaction
 again with incremental repairs?

 Cheers,
 Roland






Re: Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread Eric Stevens
Are you using Leveled compaction strategy?  If you fall behind on
compaction in leveled (and you will during bootstrap), by default Cassandra
will fall back to size tiered compaction in level 0.  This will cause
SSTables larger than sstable_size_in_mb, and those will be recompacted away
into level 1.  When level 1 gets full, those will be recompacted away into
level 2.  If level 2 gets full, into level 3 and so on.

Leveled compaction is more I/O intensive than size tiered compaction for
reasons such as this.  The same data can be compacted numerous times before
a node settles down.  The upside is that it puts a practical upper bound on
the number of sstables which can be involved in a read (not more than one
per level, not counting false positives from bloom filters).

Leveled compaction is essentially trading increased I/O at write time
(including and especially bootstrap or repair) for decreased I/O at read
time.

On Thu, Jan 8, 2015 at 6:12 AM, Robert Wille rwi...@fold3.com wrote:

 After bootstrapping a node, the node repeatedly compacts the same tables
 over and over, even though my cluster is completely idle. I’ve noticed the
 same behavior after extended periods of heavy writes. I realize that during
 bootstrapping (or extended periods of heavy writes) that compaction could
 get seriously behind, but once a table has been compacted, I don’t see the
 need to recompact the table dozens of more times.

 Possibly related, I often see that OpsCenter reports that nodes have a
 large number of pending tasks, when Pending column of the Thread Pool Stats
 doesn’t reflect that.

 Robert




Re: How to bulkload into a specific data center?

2015-01-08 Thread Ryan Svihla
Just noticed you'd sent this to the dev list, this is a question for only
the user list, and please do not send questions of this type to the
developer list.

On Thu, Jan 8, 2015 at 8:33 AM, Ryan Svihla r...@foundev.pro wrote:

 The nature of replication factor is such that writes will go wherever
 there is replication. If you're wanting responses to be faster, and not
 involve the REST data center in the spark job for response I suggest using
 a cql driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the
 spark cassandra connector here
 https://github.com/datastax/spark-cassandra-connector ) . While write
 traffic will still be replicated to the REST service data center, because
 you do want those results available, you will not be waiting on the remote
 data center to respond successful.

 Final point, bulk loading sends a copy per replica across the wire, so
 lets say you have RF3 in each data center that means bulk loading will send
 out 6 copies from that client at once, with normal mutations via thrift or
 cql writes between data centers go out as 1 copy, then that node will
 forward on to the other replicas. This means intra data center traffic in
 this case would be 3x more with the bulk loader than with using a
 traditional cql or thrift based client.



 On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang bewang.t...@gmail.com wrote:

 I set up two virtual data centers, one for analytics and one for REST
 service. The analytics data center sits top on Hadoop cluster. I want to
 bulk load my ETL results into the analytics data center so that the REST
 service won't have the heavy load. I'm using CQLTableInputFormat in my
 Spark Application, and I gave the nodes in analytics data center as
 Intialial address.

 However, I found my jobs were connecting to the REST service data center.

 How can I specify the data center?




 --

 Thanks,
 Ryan Svihla




-- 

Thanks,
Ryan Svihla


Re: Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread mck

 Are you using Leveled compaction strategy? 


And if you're using Date Tiered compaction strategy on a table that
isn't time-series data, for example deletes happen, you find it
compacting over and over.

~mck


Re: How to bulkload into a specific data center?

2015-01-08 Thread Ryan Svihla
The nature of replication factor is such that writes will go wherever there
is replication. If you're wanting responses to be faster, and not involve
the REST data center in the spark job for response I suggest using a cql
driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the spark
cassandra connector here
https://github.com/datastax/spark-cassandra-connector ) . While write
traffic will still be replicated to the REST service data center, because
you do want those results available, you will not be waiting on the remote
data center to respond successful.

Final point, bulk loading sends a copy per replica across the wire, so lets
say you have RF3 in each data center that means bulk loading will send out
6 copies from that client at once, with normal mutations via thrift or cql
writes between data centers go out as 1 copy, then that node will forward
on to the other replicas. This means intra data center traffic in this case
would be 3x more with the bulk loader than with using a traditional cql or
thrift based client.



On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang bewang.t...@gmail.com wrote:

 I set up two virtual data centers, one for analytics and one for REST
 service. The analytics data center sits top on Hadoop cluster. I want to
 bulk load my ETL results into the analytics data center so that the REST
 service won't have the heavy load. I'm using CQLTableInputFormat in my
 Spark Application, and I gave the nodes in analytics data center as
 Intialial address.

 However, I found my jobs were connecting to the REST service data center.

 How can I specify the data center?




-- 

Thanks,
Ryan Svihla


Updated JMX metrics overview

2015-01-08 Thread Reik Schatz
I am adding some jmx metrics to our monitoring tool. Is there a good
overview of all existing jmx metrics and their description?
http://wiki.apache.org/cassandra/Metrics has only a few metrics.

In particular I have questions about these now:

org.apache.cassandra.db type=StorageProxy TotalHints - is this the
number of hints since the node was started or a lifetime value

org.apache.cassandra.db type=StorageProxy ReadRepairRepairedBackground
- is this the total number of background read repair requests that the
node received since
restart?

org.apache.cassandra.db type=StorageProxy ReadRepairRepairedBlocking -
same as above, I assume thats the number of read repairs that have
blocked a query (when does that even happen - aren't read repairs run
in the background)

org.apache.cassandra.metrics type=Storage name=Exceptions Count - is
this the total number of unhandled exceptions since node restart?


Re: incremental repairs

2015-01-08 Thread Roland Etzenhammer

Hi Marcus,

thanks a lot for those pointers. Now further testing can begin - and 
I'll wait for 2.1.3. Right now on production repair times are really 
painful, maybe that will become better. At least I hope so :-)


Re: incremental repairs

2015-01-08 Thread Robert Coli
On Thu, Jan 8, 2015 at 12:28 AM, Marcus Eriksson krum...@gmail.com wrote:

 But, if you are running 2.1 in production, I would recommend that you wait
 until 2.1.3 is out, https://issues.apache.org/jira/browse/CASSANDRA-8316
 fixes a bunch of issues with incremental repairs


There are other serious issues with 2.1.2, I +1 recommend no one run it in
production. :D

=Rob


High read latency after data volume increased

2015-01-08 Thread Roni Balthazar
Hi there,

We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.

While our data volume is increasing (34 TB now), we are running into
some problems:

1) Read latency is around 1000 ms when running 600 reads/sec (DC1
CL.LOCAL_ONE). At the same time the load average is about 20-30 on all
DC1 nodes(8 cores CPU - 32 GB RAM). C* starts timing out connections.
Still in this scenario OpsCenter has some issues as well. Opscenter
resets all Graphs layout and backs to the default layout on every
refresh. It doesn't back to normal after the load decrease. I only
managed to put OpsCenter to it's normal behavior after reinstalling
it.
Just for reference, we are using SATA HDDs on all nodes and running
hdparm to check disk performance under this load, some nodes are
reporting very low read rates (under 10 MB/sec), while others above
100 MB/sec. Under low load average this rate is above 250 MB/sec.

2) Repair takes at least 4-5 days to complete. Last repair was 20 days
ago. Running repair under high loads is bringing some nodes down with
the exception: JVMStabilityInspector.java:94 - JVM state determined
to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError:
Java heap space

Any hints?

Regards,

Roni Balthazar


Re: High read latency after data volume increased

2015-01-08 Thread Robert Coli
On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar ronibaltha...@gmail.com
wrote:

 We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2.


https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

2.1.2 in particular is known to have significant issues. You'd be better
off running 2.1.1 ...

=Rob