Re: Troubleshooting random node latency spikes

2017-01-23 Thread Brooke Jensen
Hi Ted.

How long are the latency spikes when they occur?  Have you investigated
compactions (nodetool compactionstats) during the spike?

Are you also seeing large latency spikes in the p95 (95th percentile)
metrics? p99 catches outliers, which aren't necessarily always cause for
alarm.

Are the nodes showing any other signs of stress? CPU, GC, etc? Is there
anything pending in nodetool tpstats?

Regarding the read repairs, have you tested writing at a higher consistency
to see if that changes the number of RR occurring?


*Brooke Jensen*
VP Technical Operations & Customer Services
www.instaclustr.com | support.instaclustr.com


This email has been sent on behalf of Instaclustr Limited (Australia) and
Instaclustr Inc (USA). This email and any attachments may contain
confidential and legally privileged information.  If you are not the
intended recipient, do not copy or disclose its content, but please reply
to this email immediately and highlight the error to the sender and then
immediately delete the message.

On 18 January 2017 at 02:11,  wrote:

> Is this Java 8 with the G1 garbage collector or CMS? With Java 7 and CMS,
> garbage collection can cause delays like you are seeing. I haven’t seen
> that problem with G1, but garbage collection is where I would start looking.
>
>
>
>
>
> Sean Durity
>
> *From:* Ted Pearson [mailto:t...@tedpearson.com]
> *Sent:* Thursday, January 05, 2017 2:34 PM
> *To:* user@cassandra.apache.org
> *Subject:* Troubleshooting random node latency spikes
>
>
>
> Greetings!
>
> I'm working on setting up a new cassandra cluster with a write-heavy
> workload (50% writes), and I've run into a strange spiky latency problem.
> My application metrics showed random latency spikes. I tracked the latency
> back to spikes on individual cassandra nodes. 
> ClientRequest.Latency.Read/Write.p99
> is occasionally jumping on one node at a time to several seconds, instead
> of its normal value of around 1000 microseconds. I also noticed
> that ReadRepair.RepairedBackground.m1_rate goes from zero to a non-zero
> (around 1-2/sec) during the spike on that node. I'm lost as to why these
> spikes are happening, hope someone can give me ideas.
>
> I attempted to test if the ReadRepair metric is causally linked to the
> latency spikes, but even when I changed dclocal_read_repair_chance to 0 on
> my tables, even though the metrics showed no ReadRepair.Attempted, the
> ReadRepair.RepairedBackground metric still went up during latency spikes.
> Am I misunderstanding what this metric tracks? I don't understand why it
> went up if I turned off read repair.
>
> I'm currently running 2.2.6 in a dual-datacenter setup. It's patched to
> allow metrics to be recency-biased instead of tracking latency over the
> entire running of the java process. I'm using STCS. There is a large amount
> of data per node, about 500GB currently. I expect each row to be less than
> 10KB. It's currently running on way overpowered hardware - 512GB/raid 0 on
> nvme/44 cores on 2 sockets. All of my queries (reads and writes) are
> LOCAL_ONE and I'm using r=3.
>
>
>
> Thanks,
>
> Ted
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any 

Re : Decommissioned nodes show as DOWN in Cassandra versions 2.1.12 - 2.1.16

2017-01-23 Thread sai krishnam raju potturi
In the Cassandra versions 2.1.11 - 2.1.16, after we decommission a node or
datacenter, we observe the decommissioned nodes marked as DOWN in the
cluster when you do a "nodetool describecluster". The nodes however do not
show up in the "nodetool status" command.
The decommissioned node also does not show up in the "system_peers" table
on the nodes.

The workaround we follow is rolling restart of the cluster, which removes
the decommissioned nodes from the "UNREACHABLE STATE", and shows the actual
state of the cluster. The workaround is tedious for huge clusters.

We also verified the decommission process in CCM tool, and observed the
same issue for clusters with versions from 2.1.12 to 2.1.16. The issue was
not observed in versions prior to or later than the ones mentioned above.


Has anybody in the community observed similar issue? We've also raised a
JIRA issue regarding this.
https://issues.apache.org/jira/browse/CASSANDRA-13144


Below are the observed logs from the versions without the bug, and with the
bug.  The one's highlighted in yellow show the expected logs. The one's
highlighted in red are the one's where the node is recognized as down, and
shows as UNREACHABLE.



Cassandra 2.1.1 Logs showing the decommissioned node :  (Without the bug)

2017-01-19 20:18:56,415 [GossipStage:1] DEBUG ArrivalWindow Ignoring
interval time of 2049943233 for /X.X.X.X
2017-01-19 20:18:56,416 [GossipStage:1] DEBUG StorageService Node /X.X.X.X
state left, tokens [ 59353109817657926242901533144729725259,
60254520910109313597677907197875221475,
75698727618038614819889933974570742305,
84508739091270910297310401957975430578]
2017-01-19 20:18:56,416 [GossipStage:1] DEBUG Gossiper adding expire time
for endpoint : /X.X.X.X (1485116334088)
2017-01-19 20:18:56,417 [GossipStage:1] INFO StorageService Removing
tokens [100434964734820719895982857900842892337,
114144647582686041354301802358217767299,
13209060517964702932350041942412177,
138409460913927199437556572481804704749] for /X.X.X.X
2017-01-19 20:18:56,418 [HintedHandoff:3] INFO HintedHandOffManager
Deleting any stored hints for /X.X.X.X
2017-01-19 20:18:56,424 [GossipStage:1] DEBUG MessagingService Resetting
version for /X.X.X.X
2017-01-19 20:18:56,424 [GossipStage:1] DEBUG Gossiper removing endpoint
/X.X.X.X
2017-01-19 20:18:56,437 [GossipStage:1] DEBUG StorageService Ignoring state
change for dead or unknown endpoint: /X.X.X.X
2017-01-19 20:19:02,022 [WRITE-/X.X.X.X] DEBUG OutboundTcpConnection
attempting to connect to /X.X.X.X
2017-01-19 20:19:02,023 [HANDSHAKE-/X.X.X.X] INFO OutboundTcpConnection
Handshaking version with /X.X.X.X
2017-01-19 20:19:02,023 [WRITE-/X.X.X.X] DEBUG MessagingService Setting
version 7 for /X.X.X.X
2017-01-19 20:19:08,096 [GossipStage:1] DEBUG ArrivalWindow Ignoring
interval time of 2074454222 for /X.X.X.X
2017-01-19 20:19:54,407 [GossipStage:1] DEBUG ArrivalWindow Ignoring
interval time of 4302985797 for /X.X.X.X
2017-01-19 20:19:57,405 [GossipTasks:1] DEBUG Gossiper 6 elapsed,
/X.X.X.X gossip quarantine over
2017-01-19 20:19:57,455 [GossipStage:1] DEBUG ArrivalWindow Ignoring
interval time of 3047826501 for /X.X.X.X
2017-01-19 20:19:57,455 [GossipStage:1] DEBUG StorageService Ignoring state
change for dead or unknown endpoint: /X.X.X.X


Cassandra 2.1.16 Logs showing the decommissioned node :   (The logs in
2.1.16 show the same as 2.1.1 upto "DEBUG Gossiper 6 elapsed, /X.X.X.X
gossip quarantine over", and then is followed by "NODE is now DOWN"

017-01-19 19:52:23,687 [GossipStage:1] DEBUG StorageService.java:1883 -
Node /X.X.X.X state left, tokens [-1112888759032625467,
-228773855963737699, -311455042375
4381391, -4848625944949064281, -6920961603460018610, -8566729719076824066,
1611098831406674636, 7278843689020594771, 7565410054791352413, 9166885764,
8654747784805453046]
2017-01-19 19:52:23,688 [GossipStage:1] DEBUG Gossiper.java:1520 - adding
expire time for endpoint : /X.X.X.X (1485114743567)
2017-01-19 19:52:23,688 [GossipStage:1] INFO StorageService.java:1965 -
Removing tokens [-1112888759032625467, -228773855963737699,
-3114550423754381391, -48486259449
49064281, -6920961603460018610, 5690722015779071557, 6202373691525063547,
7191120402564284381, 7278843689020594771, 7565410054791352413,
8524200089166885764, 865474778
4805453046] for /X.X.X.X
2017-01-19 19:52:23,689 [HintedHandoffManager:1] INFO
HintedHandOffManager.java:230 - Deleting any stored hints for /X.X.X.X
2017-01-19 19:52:23,689 [GossipStage:1] DEBUG MessagingService.java:840 -
Resetting version for /X.X.X.X
2017-01-19 19:52:23,690 [GossipStage:1] DEBUG Gossiper.java:417 - removing
endpoint /X.X.X.X
2017-01-19 19:52:23,691 [GossipStage:1] DEBUG StorageService.java:1552 -
Ignoring state change for dead or unknown endpoint: /X.X.X.X
2017-01-19 19:52:31,617 [MessagingService-Outgoing-/X.X.X.X] DEBUG
OutboundTcpConnection.java:372 - attempting to connect to /X.X.X.X
2017-01-19 19:52:31,618 [HANDSHAKE-/X.X.X.X] INFO
OutboundTcpConnection.java:488 - Handshaking version 

Re: Huge size of system.batches table after dropping an incomplete Materialized View

2017-01-23 Thread Vinci
Sorry about the confusion.
I meant the sstables for system.batches table which got created after dropping 
the MV still persist and have huge size.

 Original Message 

Subject: Re: Huge size of system.batches table after dropping an incomplete 
Materialized View
Local Time: 23 January 2017 8:40 PM
UTC Time: 23 January 2017 15:10
From: benjamin.r...@jaumo.com
To: user@cassandra.apache.org, Vinci 

What exactly persists? I didn't really understand you, could you be more 
specific?



2017-01-23 15:40 GMT+01:00 Vinci :

Thanks for the response.

After the MV failure and errors, MV was dropped and the table was truncated.
Then I recreated the MV and Table from scratch which worked as expected.

The huge sizes of sstables as I have mentioned are after that. Somehow it still 
persists with same last modification timestamps.

Not sure if i can safely rm these sstables or truncate system.batches on that 
node.




 Original Message 
Subject: Re: Huge size of system.batches table after dropping an incomplete 
Materialized View
Local Time: 22 January 2017 11:41 PM
UTC Time: 22 January 2017 18:11
From: benjamin.r...@jaumo.com
To: user@cassandra.apache.org, Vinci 


I cannot tell you were these errors like "Attempting to mutate ..." come from 
but under certain circumstances all view mutations are stored in batches, so 
the batchlog can grow insanely large. I don't see why a repair should help you 
in this situation. I guess what you want is to recreate the table.


1. You should not repair MVs directly. The current design is to only repairs 
the base table - though it's not properly documented. Repairing MVs can create 
inconsistent states. Only repairing the base tables wont.
2. A repair does only repair data and won't fix schema-issues
3. A repair of a base table that contains an MV is incredibly slow if the state 
is very inconsistent (which is probably the case in your situation)

What to do?
- If you don't care about the data of the MV, you of course can delete all 
SSTables (when CS is stopped) and all data will be gone. But I don't know if it 
helps.
- If you are 100% sure that no other batch logs are going on, you could also 
truncate the system.batches, otherwise your log may be flooded with 
"non-existant table" things if the batch log is replayed. It is annoying but 
should not harm anyone.

=> Start over, try to drop and create the MV. Watch out for logs referring to 
schema changes and errors

Side note:
I'd recommend not to use MVs (yet) if you don't have an "inside" understanding 
of them or "know what you are doing". They can have a very big impact on your 
cluster performance in some situations and are not generally considered as 
stable yet.



2017-01-22 18:42 GMT+01:00 Vinci :

Hi there,

Version :- Cassandra 3.0.7

I attempted to create a Materialized View on a certain table and it failed with 
never-ending WARN message "Mutation of  bytes is too large for the maximum 
size of ".

"nodetool stop VIEW_BUILD" also did not help.

That seems to be a result of 
https://issues.apache.org/jira/browse/CASSANDRA-11670 which is fixed in newer 
versions.

So I tried dropping the view and that generated error messages like following :-

ERROR [CompactionExecutor:632] [Timestamp] Keyspace.java:475 - Attempting to 
mutate non-existant table 7c2e1c40-b82b-11e6-9d20-4b0190661423 
(keyspace_name.view_name)

I performed an incremental repair of the table on which view was created and a 
rolling restart to stop these errors.

Now I see huge size of system.batches table on one of the nodes. It seems 
related to issues mentioned above since last modification timestamps of the 
sstable files inside system/batches is same as when I tried to drop the MV.

Some insight and suggestions regarding it will be very helpful. I will like to 
know if i can safely truncate the table, rm the files or any other approach to 
clean it up?

Thanks.




--


Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone [+49 7161 304880-6](tel:07161%203048806) · Fax [+49 7161 
304880-1](tel:07161%203048801)
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer




--


Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Huge size of system.batches table after dropping an incomplete Materialized View

2017-01-23 Thread Benjamin Roth
What exactly persists? I didn't really understand you, could you be more
specific?

2017-01-23 15:40 GMT+01:00 Vinci :

> Thanks for the response.
>
> After the MV failure and errors, MV was dropped and the table was
> truncated.
> Then I recreated the MV and Table from scratch which worked as expected.
>
> The huge sizes of sstables as I have mentioned are after that. Somehow it
> still persists with same last modification timestamps.
>
> Not sure if i can safely rm these sstables or truncate system.batches on
> that node.
>
>
>  Original Message 
> Subject: Re: Huge size of system.batches table after dropping an
> incomplete Materialized View
> Local Time: 22 January 2017 11:41 PM
> UTC Time: 22 January 2017 18:11
> From: benjamin.r...@jaumo.com
> To: user@cassandra.apache.org, Vinci 
>
> I cannot tell you were these errors like "Attempting to mutate ..." come
> from but under certain circumstances all view mutations are stored in
> batches, so the batchlog can grow insanely large. I don't see why a repair
> should help you in this situation. I guess what you want is to recreate the
> table.
>
> 1. You should not repair MVs directly. The current design is to only
> repairs the base table - though it's not properly documented. Repairing MVs
> can create inconsistent states. Only repairing the base tables wont.
> 2. A repair does only repair data and won't fix schema-issues
> 3. A repair of a base table that contains an MV is incredibly slow if the
> state is very inconsistent (which is probably the case in your situation)
>
> What to do?
> - If you don't care about the data of the MV, you of course can delete all
> SSTables (when CS is stopped) and all data will be gone. But I don't know
> if it helps.
> - If you are 100% sure that no other batch logs are going on, you could
> also truncate the system.batches, otherwise your log may be flooded with
> "non-existant table" things if the batch log is replayed. It is annoying
> but should not harm anyone.
>
> => Start over, try to drop and create the MV. Watch out for logs referring
> to schema changes and errors
>
> Side note:
> I'd recommend not to use MVs (yet) if you don't have an "inside"
> understanding of them or "know what you are doing". They can have a very
> big impact on your cluster performance in some situations and are not
> generally considered as stable yet.
>
> 2017-01-22 18:42 GMT+01:00 Vinci :
>
>> Hi there,
>>
>> Version :- Cassandra 3.0.7
>>
>> I attempted to create a Materialized View on a certain table and it
>> failed with never-ending WARN message "Mutation of  bytes is too
>> large for the maximum size of ".
>>
>> "nodetool stop VIEW_BUILD" also did not help.
>>
>> That seems to be a result of https://issues.apache.org/j
>> ira/browse/CASSANDRA-11670 which is fixed in newer versions.
>>
>> So I tried dropping the view and that generated error messages like
>> following :-
>>
>> ERROR [CompactionExecutor:632] [Timestamp] Keyspace.java:475 - Attempting
>> to mutate non-existant table 7c2e1c40-b82b-11e6-9d20-4b0190661423
>> (keyspace_name.view_name)
>>
>> I performed an incremental repair of the table on which view was created
>> and a rolling restart to stop these errors.
>>
>> Now I see huge size of system.batches table on one of the nodes. It seems
>> related to issues mentioned above since last modification timestamps of the
>> sstable files inside system/batches is same as when I tried to drop the MV.
>>
>> Some insight and suggestions regarding it will be very helpful. I will
>> like to know if i can safely truncate the table, rm the files or any other
>> approach to clean it up?
>>
>> Thanks.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Huge size of system.batches table after dropping an incomplete Materialized View

2017-01-23 Thread Vinci
Thanks for the response.

After the MV failure and errors, MV was dropped and the table was truncated.
Then I recreated the MV and Table from scratch which worked as expected.

The huge sizes of sstables as I have mentioned are after that. Somehow it still 
persists with same last modification timestamps.

Not sure if i can safely rm these sstables or truncate system.batches on that 
node.




 Original Message 
Subject: Re: Huge size of system.batches table after dropping an incomplete 
Materialized View
Local Time: 22 January 2017 11:41 PM
UTC Time: 22 January 2017 18:11
From: benjamin.r...@jaumo.com
To: user@cassandra.apache.org, Vinci 


I cannot tell you were these errors like "Attempting to mutate ..." come from 
but under certain circumstances all view mutations are stored in batches, so 
the batchlog can grow insanely large. I don't see why a repair should help you 
in this situation. I guess what you want is to recreate the table.


1. You should not repair MVs directly. The current design is to only repairs 
the base table - though it's not properly documented. Repairing MVs can create 
inconsistent states. Only repairing the base tables wont.
2. A repair does only repair data and won't fix schema-issues
3. A repair of a base table that contains an MV is incredibly slow if the state 
is very inconsistent (which is probably the case in your situation)

What to do?
- If you don't care about the data of the MV, you of course can delete all 
SSTables (when CS is stopped) and all data will be gone. But I don't know if it 
helps.
- If you are 100% sure that no other batch logs are going on, you could also 
truncate the system.batches, otherwise your log may be flooded with 
"non-existant table" things if the batch log is replayed. It is annoying but 
should not harm anyone.

=> Start over, try to drop and create the MV. Watch out for logs referring to 
schema changes and errors

Side note:
I'd recommend not to use MVs (yet) if you don't have an "inside" understanding 
of them or "know what you are doing". They can have a very big impact on your 
cluster performance in some situations and are not generally considered as 
stable yet.



2017-01-22 18:42 GMT+01:00 Vinci :

Hi there,

Version :- Cassandra 3.0.7

I attempted to create a Materialized View on a certain table and it failed with 
never-ending WARN message "Mutation of  bytes is too large for the maximum 
size of ".

"nodetool stop VIEW_BUILD" also did not help.

That seems to be a result of 
https://issues.apache.org/jira/browse/CASSANDRA-11670 which is fixed in newer 
versions.

So I tried dropping the view and that generated error messages like following :-

ERROR [CompactionExecutor:632] [Timestamp] Keyspace.java:475 - Attempting to 
mutate non-existant table 7c2e1c40-b82b-11e6-9d20-4b0190661423 
(keyspace_name.view_name)

I performed an incremental repair of the table on which view was created and a 
rolling restart to stop these errors.

Now I see huge size of system.batches table on one of the nodes. It seems 
related to issues mentioned above since last modification timestamps of the 
sstable files inside system/batches is same as when I tried to drop the MV.

Some insight and suggestions regarding it will be very helpful. I will like to 
know if i can safely truncate the table, rm the files or any other approach to 
clean it up?

Thanks.



--


Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Bulk Import Question

2017-01-23 Thread Joe Olson
I am bulk importing a large number of sstables that I pre-generated using the 
bulk load process outlined at 

https://github.com/yukim/cassandra-bulkload-example 

I am using the 'sstableloader' utility to import them into a nine node 
Cassandra cluster. 

During the sstableloader execution, I sometime get the following error in the 
logs of one of the nodes: 

ERROR [STREAM-OUT-/xx.xx.xx.xx:38544] 2017-01-19 13:38:52,148 
StreamSession.java:533 - [Stream #d90444c0-de7e-11e6-922a-e792f38c7245] 
Streaming error occurred on session with peer xx.xx.xx.xx through xx.xx.xx.xx 
java.io.IOException: Connection reset by peer 

I assume the load for that particular sstable failed, and the data within wass 
compromised and needs to be re-loaded. 

My question: is there any way to trap this (and other streaming errors) when 
using sstableloader to bulk import data? 


RE: Getting Error while Writing in Multi DC mode when Remote Dc is Down.

2017-01-23 Thread Abhishek Kumar Maheshwari
Thanks, Benjamin,

I found the issue hints was off in Cassandra.yml.



Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Benjamin Roth [mailto:benjamin.r...@jaumo.com]
Sent: Monday, January 23, 2017 6:09 PM
To: user@cassandra.apache.org
Subject: Re: Getting Error while Writing in Multi DC mode when Remote Dc is 
Down.

Sorry for the short answer, I am on the run:
I guess your hints expired. Default setting is 3h. If a node is down for a 
longertime, no hints will be written.
Only a repair will help then.

2017-01-23 12:47 GMT+01:00 Abhishek Kumar Maheshwari 
>:
Hi Benjamin,

I find the issue. while I was making query, I was overriding LOCAL_QUORUM to 
QUORUM.

Also, one more Question,

I was able insert data in DRPOCcluster. But when I bring up dc_india DC, data 
doesn’t seem in dc_india keyspace and column family (I wait near about 30 min)?




Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Benjamin Roth 
[mailto:benjamin.r...@jaumo.com]
Sent: Monday, January 23, 2017 5:05 PM
To: user@cassandra.apache.org
Subject: Re: Getting Error while Writing in Multi DC mode when Remote Dc is 
Down.

The query has QUORUM not LOCAL_QUORUM. So 3 of 5 nodes are required. Maybe 1 
node in DRPOCcluster also was temporarily unavailable during that query?

2017-01-23 12:16 GMT+01:00 Abhishek Kumar Maheshwari 
>:
Hi All,

I have Cassandra stack with 2 Dc

Datacenter: DRPOCcluster

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
UN  172.29.xx.xxx  88.88 GB   256  ?   
b6b8cbb9-1fed-471f-aea9-6a657e7ac80a  01
UN  172.29.xx.xxx  73.95 GB   256  ?   
604abbf5-8639-4104-8f60-fd6573fb2e17  03
UN  172.29. xx.xxx  66.42 GB   256  ?   
32fa79ee-93c6-4e5b-a910-f27a1e9d66c1  02
Datacenter: dc_india

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
DN  172.26. .xx.xxx  78.97 GB   256  ?   
3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
DN  172.26. .xx.xxx  79.18 GB   256  ?   
7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2


I am using below code to connect with java driver:

cluster = 
Cluster.builder().addContactPoints(hostAddresses).withRetryPolicy(DefaultRetryPolicy.INSTANCE)
   .withReconnectionPolicy(new 
ConstantReconnectionPolicy(3L))
   .withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy.Builder().withLocalDc("DRPOCcluster").withUsedHostsPerRemoteDc(2).build())).build();
cluster.getConfiguration().getQueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);

hostAddresses is 172.29.xx.xxx  . when Dc with IP 172.26. .xx.xxx   is down, we 
are getting below exception :


Exception in thread "main" 
com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas 
available for query at consistency QUORUM (3 required but only 2 alive)
   at 
com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:109)
   at 
com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:27)
   at 
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
   at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)

Cassandra version : 3.0.9
Datastax Java Driver Version:


com.datastax.cassandra
cassandra-driver-core
3.1.2



Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

We the soldiers of our new economy, pledge to stop doubting and start spending, 
to enable others to go digital, to use less cash. We pledge to 
#RemonetiseIndia. Join the Times Network ‘Remonetise India’ movement today. To 
pledge for growth, give a missed 

Re: Getting Error while Writing in Multi DC mode when Remote Dc is Down.

2017-01-23 Thread Benjamin Roth
Sorry for the short answer, I am on the run:
I guess your hints expired. Default setting is 3h. If a node is down for a
longertime, no hints will be written.
Only a repair will help then.

2017-01-23 12:47 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> Hi Benjamin,
>
>
>
> I find the issue. while I was making query, I was overriding LOCAL_QUORUM
> to QUORUM.
>
>
>
> Also, one more Question,
>
>
>
> I was able insert data in DRPOCcluster. But when I bring up dc_india DC,
> data doesn’t seem in dc_india keyspace and column family (I wait near about
> 30 min)?
>
>
>
>
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Benjamin Roth [mailto:benjamin.r...@jaumo.com]
> *Sent:* Monday, January 23, 2017 5:05 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Getting Error while Writing in Multi DC mode when Remote
> Dc is Down.
>
>
>
> The query has QUORUM not LOCAL_QUORUM. So 3 of 5 nodes are required. Maybe
> 1 node in DRPOCcluster also was temporarily unavailable during that query?
>
>
>
> 2017-01-23 12:16 GMT+01:00 Abhishek Kumar Maheshwari  timesinternet.in>:
>
> Hi All,
>
>
>
> I have Cassandra stack with 2 Dc
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.xx.xxx  88.88 GB   256  ?   
> b6b8cbb9-1fed-471f-aea9-6a657e7ac80a
> 01
>
> UN  172.29.xx.xxx  73.95 GB   256  ?   
> 604abbf5-8639-4104-8f60-fd6573fb2e17
> 03
>
> UN  172.29. xx.xxx  66.42 GB   256  ?
> 32fa79ee-93c6-4e5b-a910-f27a1e9d66c1  02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> DN  172.26. .xx.xxx  78.97 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> DN  172.26. .xx.xxx  79.18 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
>
>
> I am using below code to connect with java driver:
>
>
>
> cluster = Cluster.*builder*().addContactPoints(hostAddresses
> ).withRetryPolicy(DefaultRetryPolicy.*INSTANCE*)
>
>.withReconnectionPolicy(*new*
> ConstantReconnectionPolicy(3L))
>
>.withLoadBalancingPolicy(*new*
> TokenAwarePolicy(*new* DCAwareRoundRobinPolicy.Builder().withLocalDc("
> DRPOCcluster").withUsedHostsPerRemoteDc(2).build())).build();
>
> cluster.getConfiguration().getQueryOptions().setConsistencyLevel(
> ConsistencyLevel.LOCAL_QUORUM);
>
>
>
> hostAddresses is 172.29.xx.xxx  . when Dc with IP 172.26. .xx.xxx   is
> down, we are getting below exception :
>
>
>
>
>
> Exception in thread "main" 
> com.datastax.driver.core.exceptions.UnavailableException:
> Not enough replicas available for query at consistency QUORUM (3 required
> but only 2 alive)
>
>at com.datastax.driver.core.exceptions.UnavailableException.copy(
> UnavailableException.java:109)
>
>at com.datastax.driver.core.exceptions.UnavailableException.copy(
> UnavailableException.java:27)
>
>at com.datastax.driver.core.DriverThrowables.propagateCause(
> DriverThrowables.java:37)
>
>at com.datastax.driver.core.DefaultResultSetFuture.
> getUninterruptibly(DefaultResultSetFuture.java:245)
>
>
>
> Cassandra version : 3.0.9
>
> Datastax Java Driver Version:
>
>
>
> 
>
> com.datastax.cassandra
>
> cassandra-driver-
> core
>
> 3.1.2
>
> 
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> We the soldiers of our new economy, pledge to stop doubting and start
> spending, to enable others to go digital, to use less cash. We pledge to
> #RemonetiseIndia. Join the Times Network ‘Remonetise India’ movement today.
> To pledge for growth, give a missed call on +91 9223515515
> <+91%2092235%2015515>. Visit www.remonetiseindia.com
>
>
>
>
>
> --
>
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth

RE: Getting Error while Writing in Multi DC mode when Remote Dc is Down.

2017-01-23 Thread Abhishek Kumar Maheshwari
Hi Benjamin,

I find the issue. while I was making query, I was overriding LOCAL_QUORUM to 
QUORUM.

Also, one more Question,

I was able insert data in DRPOCcluster. But when I bring up dc_india DC, data 
doesn’t seem in dc_india keyspace and column family (I wait near about 30 min)?




Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Benjamin Roth [mailto:benjamin.r...@jaumo.com]
Sent: Monday, January 23, 2017 5:05 PM
To: user@cassandra.apache.org
Subject: Re: Getting Error while Writing in Multi DC mode when Remote Dc is 
Down.

The query has QUORUM not LOCAL_QUORUM. So 3 of 5 nodes are required. Maybe 1 
node in DRPOCcluster also was temporarily unavailable during that query?

2017-01-23 12:16 GMT+01:00 Abhishek Kumar Maheshwari 
>:
Hi All,

I have Cassandra stack with 2 Dc

Datacenter: DRPOCcluster

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
UN  172.29.xx.xxx  88.88 GB   256  ?   
b6b8cbb9-1fed-471f-aea9-6a657e7ac80a  01
UN  172.29.xx.xxx  73.95 GB   256  ?   
604abbf5-8639-4104-8f60-fd6573fb2e17  03
UN  172.29. xx.xxx  66.42 GB   256  ?   
32fa79ee-93c6-4e5b-a910-f27a1e9d66c1  02
Datacenter: dc_india

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
DN  172.26. .xx.xxx  78.97 GB   256  ?   
3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
DN  172.26. .xx.xxx  79.18 GB   256  ?   
7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2


I am using below code to connect with java driver:

cluster = 
Cluster.builder().addContactPoints(hostAddresses).withRetryPolicy(DefaultRetryPolicy.INSTANCE)
   .withReconnectionPolicy(new 
ConstantReconnectionPolicy(3L))
   .withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy.Builder().withLocalDc("DRPOCcluster").withUsedHostsPerRemoteDc(2).build())).build();
cluster.getConfiguration().getQueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);

hostAddresses is 172.29.xx.xxx  . when Dc with IP 172.26. .xx.xxx   is down, we 
are getting below exception :


Exception in thread "main" 
com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas 
available for query at consistency QUORUM (3 required but only 2 alive)
   at 
com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:109)
   at 
com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:27)
   at 
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
   at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)

Cassandra version : 3.0.9
Datastax Java Driver Version:


com.datastax.cassandra
cassandra-driver-core
3.1.2



Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

We the soldiers of our new economy, pledge to stop doubting and start spending, 
to enable others to go digital, to use less cash. We pledge to 
#RemonetiseIndia. Join the Times Network ‘Remonetise India’ movement today. To 
pledge for growth, give a missed call on +91 
9223515515. Visit 
www.remonetiseindia.com



--
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Getting Error while Writing in Multi DC mode when Remote Dc is Down.

2017-01-23 Thread Benjamin Roth
The query has QUORUM not LOCAL_QUORUM. So 3 of 5 nodes are required. Maybe
1 node in DRPOCcluster also was temporarily unavailable during that query?

2017-01-23 12:16 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> Hi All,
>
>
>
> I have Cassandra stack with 2 Dc
>
>
>
> Datacenter: DRPOCcluster
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> UN  172.29.xx.xxx  88.88 GB   256  ?   
> b6b8cbb9-1fed-471f-aea9-6a657e7ac80a
> 01
>
> UN  172.29.xx.xxx  73.95 GB   256  ?   
> 604abbf5-8639-4104-8f60-fd6573fb2e17
> 03
>
> UN  172.29. xx.xxx  66.42 GB   256  ?
> 32fa79ee-93c6-4e5b-a910-f27a1e9d66c1  02
>
> Datacenter: dc_india
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  AddressLoad   Tokens   OwnsHost
> ID   Rack
>
> DN  172.26. .xx.xxx  78.97 GB   256  ?
> 3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
>
> DN  172.26. .xx.xxx  79.18 GB   256  ?
> 7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2
>
>
>
>
>
> I am using below code to connect with java driver:
>
>
>
> cluster = Cluster.*builder*().addContactPoints(hostAddresses
> ).withRetryPolicy(DefaultRetryPolicy.*INSTANCE*)
>
>.withReconnectionPolicy(*new*
> ConstantReconnectionPolicy(3L))
>
>.withLoadBalancingPolicy(*new*
> TokenAwarePolicy(*new* DCAwareRoundRobinPolicy.Builder().withLocalDc("
> DRPOCcluster").withUsedHostsPerRemoteDc(2).build())).build();
>
> cluster.getConfiguration().getQueryOptions().setConsistencyLevel(
> ConsistencyLevel.LOCAL_QUORUM);
>
>
>
> hostAddresses is 172.29.xx.xxx  . when Dc with IP 172.26. .xx.xxx   is
> down, we are getting below exception :
>
>
>
>
>
> Exception in thread "main" 
> com.datastax.driver.core.exceptions.UnavailableException:
> Not enough replicas available for query at consistency QUORUM (3 required
> but only 2 alive)
>
>at com.datastax.driver.core.exceptions.UnavailableException.copy(
> UnavailableException.java:109)
>
>at com.datastax.driver.core.exceptions.UnavailableException.copy(
> UnavailableException.java:27)
>
>at com.datastax.driver.core.DriverThrowables.propagateCause(
> DriverThrowables.java:37)
>
>at com.datastax.driver.core.DefaultResultSetFuture.
> getUninterruptibly(DefaultResultSetFuture.java:245)
>
>
>
> Cassandra version : 3.0.9
>
> Datastax Java Driver Version:
>
>
>
> 
>
> com.datastax.cassandra
>
> cassandra-driver-
> core
>
> 3.1.2
>
> 
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
> We the soldiers of our new economy, pledge to stop doubting and start
> spending, to enable others to go digital, to use less cash. We pledge to
> #RemonetiseIndia. Join the Times Network ‘Remonetise India’ movement today.
> To pledge for growth, give a missed call on +91 9223515515
> <+91%2092235%2015515>. Visit www.remonetiseindia.com
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Getting Error while Writing in Multi DC mode when Remote Dc is Down.

2017-01-23 Thread Abhishek Kumar Maheshwari
Hi All,

I have Cassandra stack with 2 Dc

Datacenter: DRPOCcluster

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
UN  172.29.xx.xxx  88.88 GB   256  ?   
b6b8cbb9-1fed-471f-aea9-6a657e7ac80a  01
UN  172.29.xx.xxx  73.95 GB   256  ?   
604abbf5-8639-4104-8f60-fd6573fb2e17  03
UN  172.29. xx.xxx  66.42 GB   256  ?   
32fa79ee-93c6-4e5b-a910-f27a1e9d66c1  02
Datacenter: dc_india

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
DN  172.26. .xx.xxx  78.97 GB   256  ?   
3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
DN  172.26. .xx.xxx  79.18 GB   256  ?   
7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2


I am using below code to connect with java driver:

cluster = 
Cluster.builder().addContactPoints(hostAddresses).withRetryPolicy(DefaultRetryPolicy.INSTANCE)
   .withReconnectionPolicy(new 
ConstantReconnectionPolicy(3L))
   .withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy.Builder().withLocalDc("DRPOCcluster").withUsedHostsPerRemoteDc(2).build())).build();
cluster.getConfiguration().getQueryOptions().setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);

hostAddresses is 172.29.xx.xxx  . when Dc with IP 172.26. .xx.xxx   is down, we 
are getting below exception :


Exception in thread "main" 
com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas 
available for query at consistency QUORUM (3 required but only 2 alive)
   at 
com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:109)
   at 
com.datastax.driver.core.exceptions.UnavailableException.copy(UnavailableException.java:27)
   at 
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
   at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)

Cassandra version : 3.0.9
Datastax Java Driver Version:


com.datastax.cassandra
cassandra-driver-core
3.1.2



Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

We the soldiers of our new economy, pledge to stop doubting and start spending, 
to enable others to go digital, to use less cash. We pledge to 
#RemonetiseIndia. Join the Times Network 'Remonetise India' movement today. To 
pledge for growth, give a missed call on +91 9223515515. Visit 
www.remonetiseindia.com


Re: Kill queries

2017-01-23 Thread CharSyam
to kill running query, there is no way.
turn off all nodes and turn on.
cassandra doesnt support kill query feature

2017년 1월 23일 월요일, Cogumelos Maravilha님이 작성한 메시지:

> Hi,
>
> I'm using cqlsh --request-timeout=1 but because I've more than
> 600.000.000 rows some times I get blocked and I kill the cqlsh. But what
> about the running query in Cassandra? How can I check that?
>
> Thanks in advance.
>
>


Kill queries

2017-01-23 Thread Cogumelos Maravilha
Hi,

I'm using cqlsh --request-timeout=1 but because I've more than
600.000.000 rows some times I get blocked and I kill the cqlsh. But what
about the running query in Cassandra? How can I check that?

Thanks in advance.



question on multi DC setup and LWT's

2017-01-23 Thread Kant Kodali
HI Guys,

Lets say I have 2 DC's and I have 3 node cluster on each DC and one replica
on each DC. I would like to maintain Strong consistency and high
availability so

1) First of all, How do I even set up one replica on each DC?
2) what should my read and write consistent levels be when I am using LWT?
3) what is the difference of between QUORUM and SERIAL when using LWT for
both reads and writes?

Thanks!


Re: JVM state determined to be unstable. Exiting forcefully. what is Java Stability Inspector ?? why it is stopping DSE?

2017-01-23 Thread chetan kumar
You can verify unevenly data distribution using nodetool command

*nodetool toppartitions keyspace table*

On Mon, Jan 23, 2017 at 12:52 PM, chetan kumar  wrote:

> Hi Pranay,
>
> i seems that your data is unevenly distributed across the cluster with
> respect your insertion frequency.Please restructure your partition key
>
> Thanks
>
> On Fri, Jan 20, 2017 at 6:49 AM, Pranay akula 
> wrote:
>
>> what i have observed is  2-3 old gen GC's in 1-2 mins before OOM which i
>> rarely see and seen hinted handoffs get accumulated on nodes which went
>> down, and Mutation drops as well.
>>
>> i really don't know how to analyse hprof file is there any guide or blog
>>  that can help me how to analyse it ?? our cluster has 2 DC's each DC with
>> 18 nodes each and 12 GB Heap and 4 GB new Heap.
>>
>>
>> Thanks
>> Pranay.
>>
>> On Thu, Jan 19, 2017 at 8:19 AM, Alain RODRIGUEZ 
>> wrote:
>>
>>> Hi Pranay,
>>>
>>> what can be the reason for this
>>>
>>>
>>> It can be due to a JVM / GC misconfiguration or to some abnormal
>>> activity in Cassandra. Often, GC issues are a consequences and not the root
>>> cause of an issue in Cassandra.
>>>
>>>
 how to debug that ??
>>>
>>> how to fine grain why on those particular nodes this is happening when
 these nodes are serving same requests like rest of the cluster ??
>>>
>>>
>>> You can enable GC logs on those nodes (use the cassandra-env.sh file to
>>> do so) and have a look at what's happening there. Also you can have a look
>>> at the system.log files (search for warning or errors - WARN / ERROR) and
>>> at "nodetool tpstats". I like to use this last command as follow "watch -d
>>> nodetool tpstats" to see variations.
>>>
>>> Having pending or dropped threads is likely to increase the GC activity.
>>> As well as having wide rows, many tomstones and some other cases.
>>>
>>> So to determine why this is happening, could you share your hardware
>>> specs, the way JVM / GC is configured (cassandra-env.sh) and let us know
>>> how nodes are handling threads and about any relevant infrmation that might
>>> be appearing in the logs.
>>>
>>> You can investigate the heap dump as well (I believe you can do this
>>> using Eclipse Memory Analyzer - MAT).
>>>
>>> C*heers,
>>> ---
>>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> 2017-01-19 14:00 GMT+01:00 Pranay akula :
>>>
 From last few days i am seeing on some of the nodes in cassandra
 cluster DSE is getting shutdown due to the error below and i need to kill
 Java process and restart DSE service.

 I have cross checked reads and writes and compactions nothing looks
 suspicious, but i am seeing full Gc pause on these server just before the
 issue happening. what can be the reason for this how to debug that ?? how
 to fine grain why on those particular nodes this is happening when these
 nodes are serving same requests like rest of the cluster ??

 Is this happening because of Full Gc is not getting performed properly,
 we using G1GC and DSE 4.8.3


 ERROR [SharedPool-Worker-25] 2016-12-27 10:14:26,100
  JVMStabilityInspector.java:117 - JVM state determined to be
 unstable.  Exiting forcefully due to:java.lang.OutOfMemoryError: Java heap
 space

 at java.util.Arrays.copyOf(Arrays.java:3181) ~[na:1.8.0_74]

 at 
 org.apache.cassandra.db.RangeTombstoneList.copy(RangeTombstoneList.java:112)
 ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

 at org.apache.cassandra.db.Deleti
 onInfo.copy(DeletionInfo.java:104) ~[cassandra-all-2.1.13.1131.ja
 r:2.1.13.1131]

 at org.apache.cassandra.db.Atomic
 BTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:217)
 ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

 at org.apache.cassandra.db.Memtable.put(Memtable.java:210)
 ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

 at org.apache.cassandra.db.Column
 FamilyStore.apply(ColumnFamilyStore.java:1230)
 ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396)
 ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:359)
 ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

 at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
 ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

 at org.apache.cassandra.db.Mutati
 onVerbHandler.doVerb(MutationVerbHandler.java:54)
 ~[cassandra-all-2.1.13.1131.jar:2.1.13.1131]

 at org.apache.cassandra.net.Messa
 geDeliveryTask.run(MessageDeliveryTask.java:64)