crash with OOM

2016-09-27 Thread xutom

Hi, all
I have a C* cluster with 12 nodes.  My cassandra version is 2.1.14; Just now 
two nodes crashed and client fails to export data with read consistency QUORUM. 
The following are logs of failed nodes:

ERROR [SharedPool-Worker-159] 2016-09-26 20:51:14,124 Message.java:538 - 
Unexpected exception during request; channel = [id: 0xce43a388, 
/13.13.13.80:55536 :> /13.13.13.149:9042]
java.lang.AssertionError: null
at 
org.apache.cassandra.transport.ServerConnection.applyStateTransition(ServerConnection.java:100)
 ~[apache-cassandra-2.1.14.jar:2.1.14]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:442)
 [apache-cassandra-2.1.14.jar:2.1.14]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
 [apache-cassandra-2.1.14.jar:2.1.14]
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 [netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 [netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
 [netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
 [netty-all-4.0.23.Final.jar:4.0.23.Final]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[na:1.7.0_65]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
 [apache-cassandra-2.1.14.jar:2.1.14]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[apache-cassandra-2.1.14.jar:2.1.14]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
ERROR [SharedPool-Worker-116] 2016-09-26 20:51:14,125 
JVMStabilityInspector.java:117 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.lang.OutOfMemoryError: Java heap space
ERROR [SharedPool-Worker-121] 2016-09-26 20:51:14,125 
JVMStabilityInspector.java:117 - JVM state determined to be unstable.  Exiting 
forcefully due to:
java.lang.OutOfMemoryError: Java heap space
ERROR [SharedPool-Worker-157] 2016-09-26 20:51:14,124 Message.java:538 - 
Unexpected exception during request; channel = [id: 0xce43a388, 
/13.13.13.80:55536 :> /13.13.13.149:9042]

My server has total 256G memory so I set the MAX_HEAP_SIZE 60G, the config in 
cassandra-env.sh:
MAX_HEAP_SIZE="60G"
HEAP_NEWSIZE="20G"
How to solve such OOM?


Re:Re: Data export with consistency problem

2016-03-25 Thread xutom
Thanks for ur reply!
I am so sorry for my poor English.
My keyspace replication is 3 and client read and write CL both are QUORUM.
If we remove the network cable of one node, import 30 million rows of data into 
that table, and thenreconnect the network cable, we export the data immediately 
and we cannot get all the 30 million rows of data.
But if we manually run ' kill -9 pid' of one node, import 30 million rows of 
data into that table, and then restart the cassandra of that node, we export 
the data immediately and we cat get all the 30 million rows of data.

By the way, we do another test: we install a C* cluster with 3 nodes, we turn 
off the 'hinted handoff', and the keyspace replication is 3, the client CL 
write and read are ALL. Then we manually kill -9 pid of one node, and there are 
just two normal nodes, then we can import data into C* cluster. Why this happen 
when there are just two normal nodes and our write CL is ALL, but we can write 
data into C* cluster.


At 2016-03-25 18:26:55, "Alain RODRIGUEZ" <arodr...@gmail.com> wrote:

Hi Jerry,

It is all a matter of replication server side and consistency level client side.




The minimal setup to ensure availability and a strong consistency is RF= 3 and 
CL = (LOCAL_)QUORUM.


This way, one node can go down, you still can reach the 2 needed nodes to 
validate your reads & writes --> Availability
And as there are 3 replica and an operation is successful if it is successful 
at least on 2 replica, at least one node will be write to and read from, 
ensuring a strong and immediate consistency (multiple reads will always return 
the same value, no matter where you read).


Were you using those settings?


reconnect the network cable, we export the data immediately and we cannot all 
the 30 million rows of data


Not sure about 'export'  and 'we cannot all the 30 million rows'. But I imagine 
you were expecting to read the 30 million rows and did not.


Hinted Handoff is an optimisation (anything you can disable is an 
optimisation),  you can't rely on an optimisation like hinted handoff.


Let me know if this answer works for you before digging any further.


Also, I removed "d...@cassandra.apache.org" as this mailing list is used by the 
developers to discuss possible issues there is no issue spotted so far, just us 
trying to understand things, let's not bother those guys unless we find an 
issue :-).


C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France


The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-03-25 2:35 GMT+01:00 xutom <xutom2...@126.com>:

Hi all,
I have a C* cluster with five nodes and my cassandra version is 2.1.1 and 
we also enable "Hinted Handoff" . Everything is fine while we use C* cluster to 
store up to 10 billion rows of data. But now we have a problem. During our 
test, after we import up to 40 billion rows of data into C* cluster, we 
manually remove the network cable of one node(eg: there are 5 nodes, and we 
remove just one network cable of node to simulate minor network problem with C* 
cluster), then we  create another table and import 30 million into this table. 
Before we reconnect the network cable of that node, we export the data of the 
new table, we can export all 30 million rows many times. But after we reconnect 
the network cable, we export the data immediately and we cannot all the 30 
million rows of data. Maybe a fewer minutes later, after the C* cluster balance 
all the datas( my guess) , then we do the exporting , we could export all the 
30 million rows of data.
Is there something wrong with "Hinted Handoff"? Whille coping data from 
coordinator node to the newer incoming node, is the newer node can response the 
client`s request? Thanks in advances!

jerry





 




Data export with consistency problem

2016-03-24 Thread xutom
Hi all,
I have a C* cluster with five nodes and my cassandra version is 2.1.1 and 
we also enable "Hinted Handoff" . Everything is fine while we use C* cluster to 
store up to 10 billion rows of data. But now we have a problem. During our 
test, after we import up to 40 billion rows of data into C* cluster, we 
manually remove the network cable of one node(eg: there are 5 nodes, and we 
remove just one network cable of node to simulate minor network problem with C* 
cluster), then we  create another table and import 30 million into this table. 
Before we reconnect the network cable of that node, we export the data of the 
new table, we can export all 30 million rows many times. But after we reconnect 
the network cable, we export the data immediately and we cannot all the 30 
million rows of data. Maybe a fewer minutes later, after the C* cluster balance 
all the datas( my guess) , then we do the exporting , we could export all the 
30 million rows of data.
Is there something wrong with "Hinted Handoff"? Whille coping data from 
coordinator node to the newer incoming node, is the newer node can response the 
client`s request? Thanks in advances!

jerry


Re:Re: endless full gc on one node

2016-01-17 Thread xutom
Hi Kai Wang,
I also encounter such issue a few days ago. I have 6 nodes, and I found 2 
nodes do endless full gc when I export ALL datas from C* using "Select * from 
table". I remove all datas of the 2 nodes and install Cassandra again, and the 
problem gone away.



At 2016-01-18 06:18:46, "Kai Wang"  wrote:

DuyHai,


In this case I didn't use batch, just bind a single PreparedStatement and 
execute. Nor did I see any warning/error about batch being too large in the log.


Thanks.



On Sat, Jan 16, 2016 at 6:27 PM, DuyHai Doan  wrote:

"As soon as inserting started, one node started non-stop full GC. The other two 
nodes were totally fine"


Just a guest, how did you insert data ? Did you use Batch statements ?


On Sat, Jan 16, 2016 at 10:12 PM, Kai Wang  wrote:

Hi,


Recently I saw some strange behavior on one of the nodes of a 3-node cluster. A 
while ago I created a table and put some data (about 150M) in it for testing. A 
few days ago I started to import full data into that table using normal cql 
INSERT statements. As soon as inserting started, one node started non-stop full 
GC. The other two nodes were totally fine. I stopped the inserting process, 
restarted C* on all the nodes. All nodes are fine. But once I started inserting 
again, full GC kicked in on that node within a minute.The insertion speed is 
moderate. Again, the other two nodes were fine. I tried this process a couple 
of times. Every time the same node jumped into full GC. I even rebooted all the 
boxes. I checked system.log but found no errors or warnings before full GC 
started.


Finally I deleted and recreated the table. All of sudden the problem went away. 
The only thing I can think of is that table was created using STCS. After I 
inserted 150M data into it, I switched it to LCS. Then I ran incremental repair 
a couple of times. I saw validation and normal compaction on that table as 
expected. When I recreated the table, I created it with LCS.


I don't have the problem any more but just want to share the experience. Maybe 
someone has an theory on this? BTW I am running C* 2.2.4 with CentOS 7 and Java 
8. All boxes have the identical configurations.



Thanks.






Re:Re: Fail to export all data in C* cluster

2016-01-04 Thread xutom
Dear Jack,
Thanks!
My keyspace is such as:
test@cqlsh> DESC KEYSPACE sky ;
CREATE KEYSPACE sky WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}  AND durable_writes = true;
CREATE TABLE sky.user1 (pati int, uuid text, name text,  name2 text,
PRIMARY KEY (pati, uuid))

Now I am using CL=ALL during inserting and set the retry policy by such 
following codes:
RetryPolicy rp = new CustomRetryPolicy(3, 3, 2);
Cluster cluster = Cluster.builder().addContactPoint(seedIp).withCredentials(
"test", "test")
.withRetryPolicy(rp)
.withLoadBalancingPolicy(
new TokenAwarePolicy(new DCAwareRoundRobinPolicy()))  
/*Here My Cluster has 6 nodes, all in the same DataCenter*/
.build();

PreparedStatement insertStatement = session
.prepare("INSERT INTO  " + tableName
+ "(" + columns + ") "
+ "VALUES (?, ?, ?, ?);");
insertStatement.setConsistencyLevel(ConsistencyLevel.ALL); /* Here I set the CL 
to ALL */
I start 30 threads to insert datas, each thread uses BatchStatement to 
insert 100 rows with the same partition key but different primary key every 
time, and run 10 times. After Inserting about 99084500 rows into C* cluster 
with many timeout exceptions I stop the inserting progress and then use 
following codes to export all the datas into local file:
String cqlstr = " select * from " + this.tableName
+ " where pati = " + this.partition[i];
PreparedStatement Statement = session
.prepare(cqlstr);
BoundStatement bStatement = new BoundStatement(
Statement);
bStatement.setFetchSize(1);
iter = session.execute(bStatement).iterator();
then I write the results( in iter) to localfile. I run  3 times and all three 
results are different。

I have set the CL to ALL when inserting datas, but why I get the different 
results when I export all datas everytime?
By the way, I have set :  "hinted_handoff_enabled: true", is it the problem 
when the C* cluster is overloaded even if I have set the CL to ALL?

Best Regrads
jerry


At 2016-01-04 23:37:20, "Jack Krupansky" <jack.krupan...@gmail.com> wrote:

You have three choices:


1. Insert with CL=ALL, with client-level retries if the write fails due to the 
cluster being overloaded.
2. Insert with CL=QUORUM and then run repair after all data has been inserted.
3. Lower your insert rate in your client so that the cluster can keep up with 
your inserts.


Yes, Cassandra supports eventual consistency, but if you overload the cluster, 
the hinted handoff for nodes beyond the requested CL may timeout and be 
discarded, hence the need for repair.


What CL are you currently using for inserts?




-- Jack Krupansky


On Mon, Jan 4, 2016 at 9:52 AM, xutom <xutom2...@126.com> wrote:

Hi all,

I have a C* cluster with 6 nodes. My cassandra version is 2.1.1. I start 50 
threads to insert datas into C* cluster, each thread inserts about up to 100 
million rows with the same partition key. After inserting all the datas, I 
start another app with 50 threads to export all the datas into localfile, I 
using such cqlsh: select * from table where partition_id=xxx(each partition has 
about 100 million rows). But unfortunately I fail to export all the datas: I 
run 3 times, and each time I get the different number of results. If I 
successfully export all datas, everytime I should get the same number of 
results, is it right?

Best Regards





 




Fail to export all data in C* cluster

2016-01-04 Thread xutom
Hi all,

I have a C* cluster with 6 nodes. My cassandra version is 2.1.1. I start 50 
threads to insert datas into C* cluster, each thread inserts about up to 100 
million rows with the same partition key. After inserting all the datas, I 
start another app with 50 threads to export all the datas into localfile, I 
using such cqlsh: select * from table where partition_id=xxx(each partition has 
about 100 million rows). But unfortunately I fail to export all the datas: I 
run 3 times, and each time I get the different number of results. If I 
successfully export all datas, everytime I should get the same number of 
results, is it right?

Best Regards


cassandra full gc too long

2015-12-28 Thread xutom
Hi all,
I have 5 nodes in my C* cluster, and each node has the same configuration 
file(Cassandra-env.sh: MAX_HEAP_SIZE="32G" and HEAP_NEWSIZE="8G"), and My 
Cassandra version is 2.1.1. Now I want to export all data of one table, i am 
using  select * from tablename, and set the bStatement.setFetchSize(1); 
When exporting the data, I check the status of the C* cluster, I found the 
following:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  OwnsHost ID  
 Rack
UN  134.13.13.36  15.82 GB   256 ?   
67aecea2-00f5-4001-ba4c-dc6138edaaa0  rack1
UN  134.13.13.37  20.25 GB   256 ?   
a8c0ec24-6e48-4082-be53-32a971a4  rack1
DN  134.13.13.33  18.36 GB   256 ?   
d93f8201-a3a7-404a-8dd6-164ddbfeaaa7  rack1
UN  134.13.13.34  16.9 GB256 ?   
6880cf40-0b73-4bbc-a0a5-7799a983aaab  rack1
UN  134.13.13.35  19.78 GB   256 ?   
5658ae76-40a0-441f-908e-9d11ab2baaad  rack1
The Cassandra jstat information on node 134.13.13.33 is following. I can see 
that Cassandra does not use S1, and it does not do the young GC, instead, it 
just do Full GC, and it do so many Full GC and so long, so C* cluster think it 
is Down. The node 134.13.13.33 can only work normally after startup for a 
while, after many times to do Full GC, it is Down and cannot finish the Full GC 
progress and crash at the end.
[cache@cache02 caslog]$ jstat -gcutil 62335 4000 2000
  S0 S1 E  O  P YGC YGCTFGCFGCT GCT   
 86.49   0.00 100.00  98.02  72.54 19   38.236 30.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.54 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.52 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.52 19   38.236 40.863   39.099
 86.49   0.00 100.00  98.02  72.52 19   38.236 40.863   39.099
  0.00   0.00  41.13 100.00  72.52 19   38.236 4   71.671  109.906
  0.00   0.00  41.13 100.00  72.52 19   38.236 4   71.671  109.906
  0.00   0.00  41.13 100.00  72.52 19   38.236 4   71.671  109.906
  0.00   0.00  41.13 100.00  72.52 19   38.236 4   71.671  109.906
  0.00   0.00  77.65 100.00  60.03 19   38.236 5   71.671  109.906
 20.27   0.00 100.00 100.00  60.27 19   38.236 5   73.905  112.141
 58.59   0.00 100.00 100.00  60.30 19   38.236 5   73.905  112.141
 98.19   0.00 100.00 100.00  60.30 19   38.236 5   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 5   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 5   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 5   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 5   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  112.141
100.00   0.00 100.00 100.00  60.30 19   38.236 6   73.905  

Re:Re: cassandra full gc too long

2015-12-28 Thread xutom


Thanks for ur reply. Exactly I have splitted that table into 47*20=940 parts, I 
have 47 partitions and each partition also has 20 buckets, so everytime I 
execute such cql: select * from table where partition_id=a and bucket_id=b, and 
the number of each select result maybe 40-80 million. 
What the " modern CQL client with paging support" mean? Is there opensource CQL 
client ?  I does not use any opensource CQL client and exporting data with my 
java code.



在 2015-12-29 11:38:51,"Robert Coli" <rc...@eventbrite.com> 写道:

On Mon, Dec 28, 2015 at 5:57 PM, xutom <xutom2...@126.com> wrote:

I have 5 nodes in my C* cluster, and each node has the same configuration 
file(Cassandra-env.sh: MAX_HEAP_SIZE="32G" and HEAP_NEWSIZE="8G"), and My 
Cassandra version is 2.1.1. Now I want to export all data of one table, i am 
using  select * from tablename,


Probably lower your heap size. If you're using CMS GC with 32gb heap you will 
get long GC pauses.


Also use a modern CQL client with paging support.


In addition, upgrade to the head of 2.1.x, 2.1.1 is not a version anyone should 
be using in production at this time.


=Rob



Fail to select ALL the datas in C*

2015-12-11 Thread xutom
Hi all,
   Now we insert 1 billion rows or more of datas into C*,  and then we use 
select command to export datas into local files. but each time when we use such 
SELECT command : SELECT * from table where id = xxx and id2 > value1 and id2 <= 
value2; to query the datas in C* and then export the datas into local files or 
hdfs files, we have diffirent numbers of datas? Can somebody knows the reason?

Thanks,
jerry


Re:Re: Re: Cassandra Tuning Issue

2015-12-08 Thread xutom



Dear Jack,
Thank you very much! Now we have much better performance when we insert the 
same partition keys in the same batch.

jerry


At 2015-12-07 13:08:31, "Jack Krupansky" <jack.krupan...@gmail.com> wrote:

If you combine inserts for multiple partition keys in the same batch you negate 
most of the effect of token-aware routing. It's best to insert only rows with 
the same partition key in a single batch. You also need to set the partition 
key for routing for the batch.


Also, RF=2 is not recommended since it does not permit quorum operations if a 
replica node is down. RF=3 is generally more appropriate.


-- Jack Krupansky


On Sun, Dec 6, 2015 at 10:27 PM, xutom <xutom2...@126.com> wrote:

Dear all,
Thanks for ur reply!
Now I`m using Apache Cassandra 2.1.1 and my JDK is 1.7.0_79,  my keyspace 
replication factor is 2,and I do enable the "token aware". The GC configuration 
is default for such as:
# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
And I check the gc log: gc.log.0.current, I found there is only one Full 
GC. The stop-the-world times is low.
CMS-initial-mark: 0.2747280 secs
CMS-remark: 0.3623090 secs

The insert codes in my test client are following:
String content = RandomStringUtils.randomAlphabetic(120);
cluster = Cluster
.builder()
.addContactPoint(this.seedIP)
.withCredentials("test", "test")
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy()))
.build();
session = cluster.connect("demo");
..

PreparedStatement insertPreparedStatement = session.prepare(
"   INSERT INTO teacher (id, lastname, firstname, city) 
" +
"VALUES (?, ?, ?, ?); ");

BatchStatement batch = new BatchStatement();
for (; i < max; i+=5) {
try {
batch.add(insertPreparedStatement.bind(i, "Entre Nous", 
"adsfasdfa1", content));
batch.add(insertPreparedStatement.bind(i+1, "Entre Nous", 
"adsfasdfa2", content));
batch.add(insertPreparedStatement.bind(i+2, "Entre Nous", 
"adsfasdfa3", content));
batch.add(insertPreparedStatement.bind(i+3, "Entre Nous", 
"adsfasdfa4", content));
batch.add(insertPreparedStatement.bind(i+4, "Entre Nous", 
"adsfasdfa5", content));
   
//System.out.println("the is is " + i);
session.execute(batch);

thisTimeCount += 5;
}
}





At 2015-12-07 00:40:06, "Graham Sanderson" <gra...@vast.com> wrote:

What version of C* are you using; what JVM version - you showed a partial GC 
config but if that is still CMS (not G1) then you are going to have insane GC 
pauses... 


Depending on C* versions are you using on/off heap memtables and what type


Those are the sorts of issues related to fat nodes; I'd be worried about - we 
run very nicely at 20G total heap and 8G new - the rest of our 128G memory is 
disk cache/mmap and all of the off heap stuff so it doesn't go to waste


That said I think Jack is probably on the right path with overloaded 
coordinators- though you'd still expect to see CPU usage unless your timeouts 
are too low for the load, In which case the coordinator would be getting no 
responses in time and quite possibly the other nodes are just dropping the 
mutations (since they don't get to them before they know the coordinator would 
have timed out) - I forget the command to check dropped mutations off the top 
of my head but you can see it in opcenter


If you have GC problems you certainly
Expect to see GC cpu usage but depending on how long you run your tests it 
might take you a little while to run thru 40G


I'm personally not a fan off >32G (ish) heaps as you can't do compressed oops 
and also it is unrealistic for CMS ... The word is that G1 is now working ok 
with C* especially on newer C* and JDK versions, but that said it takes quite a 
lot of thru-put to require insane quantities of young gen... We are guessing 
that when we remove all our legacy thrift batch inserts we will need less - and 
as for 20G total we actually don't need that much (we dropped from 24 when we 
moved memtables off heap, and believe we can drop further)

Sent from my iPhone

On Dec 6, 2015, at 9:07 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote:


What replication factor

Re:Re: Re: Re: Cassandra Tuning Issue

2015-12-08 Thread xutom
Hi Anuj,
Thanks! I will retry now!
By the way, how to " inform the C* email list as well so that others know" as 
Jack said? I am sorry I have not do that yet.

Thanks
jerry


At 2015-12-09 01:09:07, "Anuj Wadehra" <anujw_2...@yahoo.co.in> wrote:

| Hi Jerry,


Its great that you got performance improvement. Moreover, I agree with what 
Graham said. I think that you are using extremely large Heaps with CMS and that 
too in very odd ratio..Having 40G for new gen and leaving only 20G old gen 
seems unreasonable..Its hard to believe that you are having reasonable Gc 
pauses..Please recheck..I would suggest you to test your performance with much 
smaller heap..may be 16G max heap n 4G new gen..moreover make sure that you 
apply all the recommended Production settings suggested by DataStax at 
http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installRecommendSettings.html


Dont worry about wasting your memory, it will be used for OS caching and you 
can get even better performance..


Thanks
Anuj

Sent from Yahoo Mail on Android

|
From:"Jack Krupansky" <jack.krupan...@gmail.com>
Date:Tue, 8 Dec, 2015 at 8:07 pm
Subject:Re: Re: Re: Cassandra Tuning Issue


Great! Make sure to inform the C* email list as well so that others know.


-- Jack Krupansky


On Tue, Dec 8, 2015 at 7:44 AM, xutom <xutom2...@126.com> wrote:




Dear Jack,
Thank you very much! Now we have much better performance when we insert the 
same partition keys in the same batch.

jerry


At 2015-12-07 13:08:31, "Jack Krupansky" <jack.krupan...@gmail.com> wrote:

If you combine inserts for multiple partition keys in the same batch you negate 
most of the effect of token-aware routing. It's best to insert only rows with 
the same partition key in a single batch. You also need to set the partition 
key for routing for the batch.


Also, RF=2 is not recommended since it does not permit quorum operations if a 
replica node is down. RF=3 is generally more appropriate.


-- Jack Krupansky


On Sun, Dec 6, 2015 at 10:27 PM, xutom <xutom2...@126.com> wrote:

Dear all,
Thanks for ur reply!
Now I`m using Apache Cassandra 2.1.1 and my JDK is 1.7.0_79,  my keyspace 
replication factor is 2,and I do enable the "token aware". The GC configuration 
is default for such as:
# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
And I check the gc log: gc.log.0.current, I found there is only one Full 
GC. The stop-the-world times is low.
CMS-initial-mark: 0.2747280 secs
CMS-remark: 0.3623090 secs

The insert codes in my test client are following:
String content = RandomStringUtils.randomAlphabetic(120);
cluster = Cluster
.builder()
.addContactPoint(this.seedIP)
.withCredentials("test", "test")
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy()))
.build();
session = cluster.connect("demo");
..

PreparedStatement insertPreparedStatement = session.prepare(
"   INSERT INTO teacher (id, lastname, firstname, city) 
" +
"VALUES (?, ?, ?, ?); ");

BatchStatement batch = new BatchStatement();
for (; i < max; i+=5) {
try {
batch.add(insertPreparedStatement.bind(i, "Entre Nous", 
"adsfasdfa1", content));
batch.add(insertPreparedStatement.bind(i+1, "Entre Nous", 
"adsfasdfa2", content));
batch.add(insertPreparedStatement.bind(i+2, "Entre Nous", 
"adsfasdfa3", content));
batch.add(insertPreparedStatement.bind(i+3, "Entre Nous", 
"adsfasdfa4", content));
batch.add(insertPreparedStatement.bind(i+4, "Entre Nous", 
"adsfasdfa5", content));
   
//System.out.println("the is is " + i);
session.execute(batch);

thisTimeCount += 5;
}
}





At 2015-12-07 00:40:06, "Graham Sanderson" <gra...@vast.com> wrote:

What version of C* are you using; what JVM version - you showed a partial GC 
config but if that is still CMS (not G1) then you are going to have insane GC 
pauses... 


Depending on C* versions are you using on/off heap memtables and what type


Those are the sorts of issues related to fat nodes; I'd be worried about - we 
run very nicely at 20G total heap and 8G new - the rest of our 128G memory is 
disk cache/mmap and 

Re:Re: Cassandra Tuning Issue

2015-12-06 Thread xutom
Dear all,
Thanks for ur reply!
Now I`m using Apache Cassandra 2.1.1 and my JDK is 1.7.0_79,  my keyspace 
replication factor is 2,and I do enable the "token aware". The GC configuration 
is default for such as:
# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
And I check the gc log: gc.log.0.current, I found there is only one Full 
GC. The stop-the-world times is low.
CMS-initial-mark: 0.2747280 secs
CMS-remark: 0.3623090 secs

The insert codes in my test client are following:
String content = RandomStringUtils.randomAlphabetic(120);
cluster = Cluster
.builder()
.addContactPoint(this.seedIP)
.withCredentials("test", "test")
.withRetryPolicy(DefaultRetryPolicy.INSTANCE)
.withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy()))
.build();
session = cluster.connect("demo");
..

PreparedStatement insertPreparedStatement = session.prepare(
"   INSERT INTO teacher (id, lastname, firstname, city) 
" +
"VALUES (?, ?, ?, ?); ");

BatchStatement batch = new BatchStatement();
for (; i < max; i+=5) {
try {
batch.add(insertPreparedStatement.bind(i, "Entre Nous", 
"adsfasdfa1", content));
batch.add(insertPreparedStatement.bind(i+1, "Entre Nous", 
"adsfasdfa2", content));
batch.add(insertPreparedStatement.bind(i+2, "Entre Nous", 
"adsfasdfa3", content));
batch.add(insertPreparedStatement.bind(i+3, "Entre Nous", 
"adsfasdfa4", content));
batch.add(insertPreparedStatement.bind(i+4, "Entre Nous", 
"adsfasdfa5", content));
   
//System.out.println("the is is " + i);
session.execute(batch);

thisTimeCount += 5;
}
}





At 2015-12-07 00:40:06, "Graham Sanderson"  wrote:

What version of C* are you using; what JVM version - you showed a partial GC 
config but if that is still CMS (not G1) then you are going to have insane GC 
pauses... 


Depending on C* versions are you using on/off heap memtables and what type


Those are the sorts of issues related to fat nodes; I'd be worried about - we 
run very nicely at 20G total heap and 8G new - the rest of our 128G memory is 
disk cache/mmap and all of the off heap stuff so it doesn't go to waste


That said I think Jack is probably on the right path with overloaded 
coordinators- though you'd still expect to see CPU usage unless your timeouts 
are too low for the load, In which case the coordinator would be getting no 
responses in time and quite possibly the other nodes are just dropping the 
mutations (since they don't get to them before they know the coordinator would 
have timed out) - I forget the command to check dropped mutations off the top 
of my head but you can see it in opcenter


If you have GC problems you certainly
Expect to see GC cpu usage but depending on how long you run your tests it 
might take you a little while to run thru 40G


I'm personally not a fan off >32G (ish) heaps as you can't do compressed oops 
and also it is unrealistic for CMS ... The word is that G1 is now working ok 
with C* especially on newer C* and JDK versions, but that said it takes quite a 
lot of thru-put to require insane quantities of young gen... We are guessing 
that when we remove all our legacy thrift batch inserts we will need less - and 
as for 20G total we actually don't need that much (we dropped from 24 when we 
moved memtables off heap, and believe we can drop further)

Sent from my iPhone

On Dec 6, 2015, at 9:07 AM, Jack Krupansky  wrote:


What replication factor are you using? Even if your writes use CL.ONE, 
Cassandra will be attempting writes to the replica nodes in the background.


Are your writes "token aware"? If not, the receiving node has the overhead of 
forwarding the request to the node that owns the token for the primary key.


For the record, Cassandra is not designed and optimized for so-called "fat 
nodes". The design focus is "commodity hardware" and "distributed cluster" 
(typically a dozen or more nodes.)


That said, it would be good if we had a rule of thumb for how many simultaneous 
requests a node can handle, both external requests and inter-node traffic. I 
think there is an open Jira to enforce a limit on inflight requests so that 
nodes don't overloaded and start failing in the middle of writes as you seem to 
be seeing.


-- Jack Krupansky


On Sun, Dec 6, 2015 at 9:29 AM, jerry  wrote:
Dear All,

Now I have a 4 nodes Cassandra cluster, and I want to know the highest