Re: Cassandra ring not behaving like a ring

2014-01-15 Thread Narendra Sharma
RF=3.
On Jan 15, 2014 1:18 PM, "Andrey Ilinykh"  wrote:

> what is the RF? What does nodetool ring show?
>
>
> On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma <
> narendra.sha...@gmail.com> wrote:
>
>> Sorry for the odd subject but something is wrong with our cassandra ring.
>> We have a 9 node ring as below.
>>
>> N1 - UP/NORMAL
>> N2 - UP/NORMAL
>> N3 - UP/NORMAL
>> N4 - UP/NORMAL
>> N5 - UP/NORMAL
>> N6 - UP/NORMAL
>> N7 - UP/NORMAL
>> N8 - UP/NORMAL
>> N9 - UP/NORMAL
>>
>> Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.
>>
>> I added a new node with token that is exactly in middle of N6 and N7. So
>> the ring displayed as following
>> N1 - UP/NORMAL
>> N2 - UP/NORMAL
>> N3 - UP/NORMAL
>> N4 - UP/NORMAL
>> N5 - UP/NORMAL
>> N6 - UP/NORMAL
>> N6.5 - UP/JOINING
>> N7 - UP/NORMAL
>> N8 - UP/NORMAL
>> N9 - UP/NORMAL
>>
>>
>> I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to
>> steam from (worst case) N5, N6, N7, N8. What could potentially cause the
>> node to get confused about the ring?
>>
>> --
>> Narendra Sharma
>> Software Engineer
>> *http://www.aeris.com *
>> *http://narendrasharma.blogspot.com/
>> *
>>
>>
>


Re: Cassandra ring not behaving like a ring

2014-01-15 Thread Andrey Ilinykh
what is the RF? What does nodetool ring show?


On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma
wrote:

> Sorry for the odd subject but something is wrong with our cassandra ring.
> We have a 9 node ring as below.
>
> N1 - UP/NORMAL
> N2 - UP/NORMAL
> N3 - UP/NORMAL
> N4 - UP/NORMAL
> N5 - UP/NORMAL
> N6 - UP/NORMAL
> N7 - UP/NORMAL
> N8 - UP/NORMAL
> N9 - UP/NORMAL
>
> Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.
>
> I added a new node with token that is exactly in middle of N6 and N7. So
> the ring displayed as following
> N1 - UP/NORMAL
> N2 - UP/NORMAL
> N3 - UP/NORMAL
> N4 - UP/NORMAL
> N5 - UP/NORMAL
> N6 - UP/NORMAL
> N6.5 - UP/JOINING
> N7 - UP/NORMAL
> N8 - UP/NORMAL
> N9 - UP/NORMAL
>
>
> I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to
> steam from (worst case) N5, N6, N7, N8. What could potentially cause the
> node to get confused about the ring?
>
> --
> Narendra Sharma
> Software Engineer
> *http://www.aeris.com *
> *http://narendrasharma.blogspot.com/ *
>
>


Cassandra ring not behaving like a ring

2014-01-15 Thread Narendra Sharma
Sorry for the odd subject but something is wrong with our cassandra ring.
We have a 9 node ring as below.

N1 - UP/NORMAL
N2 - UP/NORMAL
N3 - UP/NORMAL
N4 - UP/NORMAL
N5 - UP/NORMAL
N6 - UP/NORMAL
N7 - UP/NORMAL
N8 - UP/NORMAL
N9 - UP/NORMAL

Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS.

I added a new node with token that is exactly in middle of N6 and N7. So
the ring displayed as following
N1 - UP/NORMAL
N2 - UP/NORMAL
N3 - UP/NORMAL
N4 - UP/NORMAL
N5 - UP/NORMAL
N6 - UP/NORMAL
N6.5 - UP/JOINING
N7 - UP/NORMAL
N8 - UP/NORMAL
N9 - UP/NORMAL


I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to
steam from (worst case) N5, N6, N7, N8. What could potentially cause the
node to get confused about the ring?

-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com *
*http://narendrasharma.blogspot.com/ *


RE: Question about out of sync

2014-01-15 Thread Logendran, Dharsan (Dharsan)
Thanks Rob,

It was quick.

Dharsan


From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: January-15-14 2:44 PM
To: user@cassandra.apache.org
Subject: Re: Question about out of sync

On Wed, Jan 15, 2014 at 11:36 AM, Logendran, Dharsan (Dharsan) 
mailto:dharsan.logend...@alcatel-lucent.com>>
 wrote:
Lets say we have a two nodes cluster(node_a and node_b) with the replication 
factor of 2.  It means both nodes(node_a and node_b) will have the same amount 
of data replicated. If one node(node_b) goes down and comes back. Is there any 
way for us to find out how the node_b is behind(out of  sync) or  needs  to 
catch up.  The amount could be in bytes, number of rows for each column family 
etc.

You can see how many hints node_a has stored for node_b, and look at how large 
that is on disk, but that's about it. If you exhaust max_hint_window_in_ms 
there is literally no way to know how out of sync you are until you run repair.

Other notes :

1) If you're asking this question, you probably have not internalized what the 
"eventual" part of eventual consistency really means to your application.
2) RF=N=2 is not a recommended production configuration. Minimal recommended is 
usually RF=3, N=6.

=Rob



Re: Question about out of sync

2014-01-15 Thread Robert Coli
On Wed, Jan 15, 2014 at 11:36 AM, Logendran, Dharsan (Dharsan) <
dharsan.logend...@alcatel-lucent.com> wrote:

>  Lets say we have a two nodes cluster(node_a and node_b) with the
> replication factor of 2.  It means both nodes(node_a and node_b) will have
> the same amount of data replicated. If one node(node_b) goes down and comes
> back. Is there any way for us to find out how the node_b is behind(out of
> sync) or  needs  to catch up.  The amount could be in bytes, number of rows
> for each column family etc.
>

You can see how many hints node_a has stored for node_b, and look at how
large that is on disk, but that's about it. If you exhaust
max_hint_window_in_ms there is literally no way to know how out of sync you
are until you run repair.

Other notes :

1) If you're asking this question, you probably have not internalized what
the "eventual" part of eventual consistency really means to your
application.
2) RF=N=2 is not a recommended production configuration. Minimal
recommended is usually RF=3, N=6.

=Rob


Re: Question about out of sync

2014-01-15 Thread Logendran, Dharsan (Dharsan)
Hi,

I would like to get some answer for the following scenario.

Lets say we have a two nodes cluster(node_a and node_b) with the replication 
factor of 2.  It means both nodes(node_a and node_b) will have the same amount 
of data replicated. If one node(node_b) goes down and comes back. Is there any 
way for us to find out how the node_b is behind(out of  sync) or  needs  to 
catch up.  The amount could be in bytes, number of rows for each column family 
etc.

Thanks
Dharsan



Re: Cassandra mad GC

2014-01-15 Thread Arya Goudarzi
It is not a good idea to change settings without identifying the root
cause. Chances are what you did masked the problem a bit for you, but the
problem is still there, isn't it?


On Wed, Jan 15, 2014 at 1:11 AM, Dimetrio  wrote:

> I set G1 because GS started to work wrong(dropped messages) with standard
> GC
> settings.
> In my opinion, Cassandra started to work more stable with G1 (it's getting
> less count of timeouts now) but it's not ideally yet.
> I just want cassandra  to works fine.
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-mad-GC-tp7592248p7592257.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 
Cheers,
-Arya


Re: various Cassandra performance problems when CQL3 is really used

2014-01-15 Thread Ondřej Černoš
Hi,

by the way, some of the issues are summarised here:
https://issues.apache.org/jira/browse/CASSANDRA-6586 and here:
https://issues.apache.org/jira/browse/CASSANDRA-6587.

regards,

ondrej cernos


On Tue, Jan 14, 2014 at 9:48 PM, Ondřej Černoš  wrote:

> Hi,
>
> thanks for the answer and sorry for the delay. Let me answer inline.
>
>
> On Wed, Dec 18, 2013 at 4:53 AM, Aaron Morton wrote:
>
>> > * select id from table where token(id) > token(some_value) and
>> secondary_index = other_val limit 2 allow filtering;
>> >
>> > Filtering absolutely kills the performance. On a table populated with
>> 130.000 records, single node Cassandra server (on my i7 notebook, 2GB of
>> JVM heap) and secondary index built on column with low cardinality of its
>> value set this query takes 156 seconds to finish.
>> Yes, this is why you have to add allow_filtering. You are asking the
>> nodes to read all the data that matches and filter in memory, that’s a SQL
>> type operation.
>>
>> Your example query is somewhat complex and I doubt it could get decent
>> performance, what does the query plan look like?
>>
>
> I don't know. How do I find out? The only mention about query plan in
> Cassandra I found is your article on your site, from 2011 and considering
> version 0.8.
>
> The example query gets computed in a fraction of the time if I perform
> just the fetch of all rows matching the token function and perform the
> filtering client side.
>
>
>
>> IMHO you need to do further de-normalisation, you will get the best
>> performance when you select rows by their full or part primary key.
>
>
> I denormalize all the way I can. The problem is I need to support paging
> and filtering at the same time. The API I must support allows filtering by
> example and paging - so how should I denormalize? Should I somehow manage
> pages of primary row keys manually? Or should I have manual secondary index
> and page somehow in the denormalized wide row?
>
> The trouble goes even further, even this doesn't perform well:
>
> select id from table where token(id) > token(some_value) and pk_cluster =
> 'val' limit N;
>
> where id and pk_cluster are primary key (CQL3 table). I guess this should
> be ordered row query and ordered column slice query, so where is the
> problem with performance?
>
>
>
>> > By the way, the performance is order of magnitude better if this patch
>> is applied:
>> That looks like it’s tuned to your specific need, it would ignore the max
>> results included in the query
>
>
> It is tuned, it only demonstrates the heuristics doesn't work well.
>
>
>> > * select id from table;
>> >
>> > As we saw in the trace log, the query - although it queries just row
>> ids - scans all columns of all the rows and (probably) compares TTL with
>> current time (?) (we saw hundreds of thousands of gettimeofday(2)). This
>> means that if the table somehow mixes wide and narrow rows, the performance
>> suffers horribly.
>> Select all rows from a table requires a range scan, which reads all rows
>> from all nodes. It should never be used production.
>>
>
> The trouble is I just need to perform it, sometimes. I know what the
> problem with the query is, but I have just a couple of thousands records -
> 150.000 - the datasets can all be stored in memory, SSTables can be fully
> mmapped. There is no reason for this query to be slow in this case.
>
>
>> Not sure what you mean by “scans all columns from all rows” a select by
>> column name will use a SliceByNamesReadCommand which will only read the
>> required columns from each SSTable (it normally short circuits though and
>> read from less).
>>
>
> The query should fetch only IDs, it checks TTLs of columns though. That is
> the point. Why does it do it?
>
>
>> if there is a TTL the ExpiringColumn.localExpirationTime must be checked,
>> if there is no TTL it will no be checked.
>
>
> It is a standard CQL3 table with ID, couple of columns and a CQL3
> collection. I didn't do anything with TTL on the table and it's columns.
>
>
>> > As Cassandra checks all the columns in selects, performance suffers
>> badly if the collection is of any interesting size.
>> This is not true, could you provide an example where you think this is
>> happening ?
>
>
> We saw it in the trace log. It happened in the select ID from table query.
> The table had a collection column.
>
>
>> > Additionally, we saw various random irreproducible freezes, high CPU
>> consumption when nothing happens (even with trace log level set no activity
>> was reported) and highly inpredictable performance characteristics after
>> nodetool flush and/or major compaction.
>> What was the HW platform and what was the load ?
>>
>
> My I7/8GB notebook, single node cluster, and virtualised AWS like
> environment, on nodes of various sizes.
>
>
>> Typically freezes in the server correlate to JVM GC, the JVM GC can also
>> be using the CPU.
>> If you have wide rows or make large reads you may run into more JVM GC
>> issues.
>>
>
>
>> nodetool flush w

RE: Cassandra mad GC

2014-01-15 Thread Dimetrio
I set G1 because GS started to work wrong(dropped messages) with standard GC
settings.
In my opinion, Cassandra started to work more stable with G1 (it's getting
less count of timeouts now) but it's not ideally yet.
I just want cassandra  to works fine.



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-mad-GC-tp7592248p7592257.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Cassandra mad GC

2014-01-15 Thread Viktor Jevdokimov
Forgot to ask, what do you want to achieve by changing default GC settings?


-Original Message-
From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
Sent: Wednesday, January 15, 2014 10:18 AM
To: user@cassandra.apache.org
Subject: RE: Cassandra mad GC

Simply don't use G1 GC, it will not be better on Cassandra than CMS, it could 
be worse.


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Dimetrio [mailto:dimet...@flysoft.ru]
Sent: Tuesday, January 14, 2014 3:16 PM
To: cassandra-u...@incubator.apache.org
Subject: Cassandra mad GC

Hi all.
I have many GC freezes on my cassandra cluster

Im using G1 GC and CMS gives similar freezes JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=1"
JVM_OPTS="$JVM_OPTS -XX:NewRatio=1"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=15"
JVM_OPTS="$JVM_OPTS -XX:-UseAdaptiveSizePolicy"
JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m"

Heap 8GB
10 nodes aws c3.4xlarge
60Gb per node

Cassandra logs


GC logs


sometimes node freeze and told that one or two nodes are down cpu load > 1000% 
LA = 6-15

pending tasks: 4
  compaction typekeyspace   table   completed
total  unit  progress
   Compaction  Social   home_timeline  4097097092
4920701908 bytes83.26%
   Compaction  Social   home_timeline  2713279974
6272012039 bytes43.26%
Active compaction remaining time :   0h00m32s

200-300 requests per second on each node with many inserts and deletes (batch 
lower than 50)

How can i reduce GC freezes?

--regards





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-mad-GC-tp7592248.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Cassandra mad GC

2014-01-15 Thread Viktor Jevdokimov
Simply don't use G1 GC, it will not be better on Cassandra than CMS, it could 
be worse.


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Dimetrio [mailto:dimet...@flysoft.ru]
Sent: Tuesday, January 14, 2014 3:16 PM
To: cassandra-u...@incubator.apache.org
Subject: Cassandra mad GC

Hi all.
I have many GC freezes on my cassandra cluster

Im using G1 GC and CMS gives similar freezes JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=1"
JVM_OPTS="$JVM_OPTS -XX:NewRatio=1"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=15"
JVM_OPTS="$JVM_OPTS -XX:-UseAdaptiveSizePolicy"
JVM_OPTS="$JVM_OPTS -XX:G1HeapRegionSize=32m"

Heap 8GB
10 nodes aws c3.4xlarge
60Gb per node

Cassandra logs


GC logs


sometimes node freeze and told that one or two nodes are down cpu load > 1000% 
LA = 6-15

pending tasks: 4
  compaction typekeyspace   table   completed
total  unit  progress
   Compaction  Social   home_timeline  4097097092
4920701908 bytes83.26%
   Compaction  Social   home_timeline  2713279974
6272012039 bytes43.26%
Active compaction remaining time :   0h00m32s

200-300 requests per second on each node with many inserts and deletes (batch 
lower than 50)

How can i reduce GC freezes?

--regards





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-mad-GC-tp7592248.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.