Re: Not able to insert data through achilles.

2016-03-02 Thread DuyHai Doan
You're right, it's a bug

 I have created an issue here to fix it here:
https://github.com/doanduyhai/Achilles/issues/240

 Fortunately you can use the query API right now to insert the static
columns:

 PreparedStatement ps =  INSERT INTO 
 BoundStatement bs = ps.bind(...)

 manager.query().nativeQuery(bs).execute();

 Optionally you can also use update DSL to set those static values:

 manager
 .dsl()
 .updateStatic()
 .fromBaseTable()
 .managet_Set("rahul")
 .location_Set("india)
 .where()
 .teamname_Eq("raman")
 .execute();

On Thu, Mar 3, 2016 at 9:02 AM, Raman Gugnani 
wrote:

> Hi
>
> I am to trying to insert data into cassandra using achilles which contains
> only partition key and static columns(all other columns and clustering key
> are null), but getting error
>
> info.archinnov.achilles.exception.AchillesException: Field 'membername'
> in entity of type 'com.xxx.domain.cassandra.entity.TeamMember' should not
> be null because it is a clustering column
>
>
> I am trying below insert through achilles
>
> TeamMember teamMember = new TeamMember();
> teamMember.setTeamname("raman");
> teamMember.setManager("rahul");
> teamMember.setLocation("india");
>
>
> manager.crud().insert(teamMember).withInsertStrategy(InsertStrategy.NOT_NULL_FIELDS).execute();
>
>
>
> But as per the refrence link it is possible to insert static columns only
> with partition key.
>
> reference link :
> https://blogs.infosupport.com/static-columns-in-cassandra-and-their-benefits/
>
> CREATE TABLE teammember_by_team (
>   teamname text,
>   manager text static,
>   location text static,
>   membername text,
>   nationality text,
>   position text,
>   PRIMARY KEY ((teamname), membername)
> );
>
> INSERT INTO teammember_by_team (teamname, manager, location)
> VALUES (‘Red Bull’, ‘Christian Horner’, ‘’);
>
> teamname  | membername | location | manager  | nationality | position
> --++--+--+-+--
>  Red Bull |   null |  | Christian Horner |null | null
>
>
>
>
>
> --
> Thanks & Regards
>
> Raman Gugnani
> *Senior Software Engineer | CaMS*
> M: +91 8588892293 | T: 0124-660 | EXT: 14255
> ASF Centre A | 2nd Floor | CA-2130 | Udyog Vihar Phase IV |
> Gurgaon | Haryana | India
>
> *Disclaimer:* This communication is for the sole use of the addressee and
> is confidential and privileged information. If you are not the intended
> recipient of this communication, you are prohibited from disclosing it and
> are required to delete it forthwith. Please note that the contents of this
> communication do not necessarily represent the views of Jasper Infotech
> Private Limited ("Company"). E-mail transmission cannot be guaranteed to be
> secure or error-free as information could be intercepted, corrupted, lost,
> destroyed, arrive late or incomplete, or contain viruses. The Company,
> therefore, does not accept liability for any loss caused due to this
> communication. *Jasper Infotech Private Limited, Registered Office: 1st
> Floor, Plot 238, Okhla Industrial Estate, New Delhi - 110020 INDIA CIN:
> U72300DL2007PTC168097*
>


Not able to insert data through achilles.

2016-03-02 Thread Raman Gugnani
Hi

I am to trying to insert data into cassandra using achilles which contains
only partition key and static columns(all other columns and clustering key
are null), but getting error

info.archinnov.achilles.exception.AchillesException: Field 'membername' in
entity of type 'com.xxx.domain.cassandra.entity.TeamMember' should not be
null because it is a clustering column


I am trying below insert through achilles

TeamMember teamMember = new TeamMember();
teamMember.setTeamname("raman");
teamMember.setManager("rahul");
teamMember.setLocation("india");

manager.crud().insert(teamMember).withInsertStrategy(InsertStrategy.NOT_NULL_FIELDS).execute();



But as per the refrence link it is possible to insert static columns only
with partition key.

reference link :
https://blogs.infosupport.com/static-columns-in-cassandra-and-their-benefits/

CREATE TABLE teammember_by_team (
  teamname text,
  manager text static,
  location text static,
  membername text,
  nationality text,
  position text,
  PRIMARY KEY ((teamname), membername)
);

INSERT INTO teammember_by_team (teamname, manager, location)
VALUES (‘Red Bull’, ‘Christian Horner’, ‘’);

teamname  | membername | location | manager  | nationality | position
--++--+--+-+--
 Red Bull |   null |  | Christian Horner |null | null





-- 
Thanks & Regards

Raman Gugnani
*Senior Software Engineer | CaMS*
M: +91 8588892293 | T: 0124-660 | EXT: 14255
ASF Centre A | 2nd Floor | CA-2130 | Udyog Vihar Phase IV |
Gurgaon | Haryana | India

*Disclaimer:* This communication is for the sole use of the addressee and
is confidential and privileged information. If you are not the intended
recipient of this communication, you are prohibited from disclosing it and
are required to delete it forthwith. Please note that the contents of this
communication do not necessarily represent the views of Jasper Infotech
Private Limited ("Company"). E-mail transmission cannot be guaranteed to be
secure or error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The Company,
therefore, does not accept liability for any loss caused due to this
communication. *Jasper Infotech Private Limited, Registered Office: 1st
Floor, Plot 238, Okhla Industrial Estate, New Delhi - 110020 INDIA CIN:
U72300DL2007PTC168097*


Re: Removing Node causes bunch of HostUnavailableException

2016-03-02 Thread Alain RODRIGUEZ
Just thought of a solution that might actually work even better.

You could try replacing one node at the time (instead of removing them)

I believe thhis would decrease the amount of streams significantly.

https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html

Good luck,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-02 23:57 GMT+01:00 Alain RODRIGUEZ :

> Hi Praveen,
>
>
>> We are not removing multiple nodes at the same time. All dead nodes are
>> from same AZ so there were no errors when the nodes were down as expected
>> (because we use QUORUM)
>
>
> Do you use at leat 3 distinct AZ ? If so, you should indeed be fine
> regarding data integrity. Also repair should then work for you. If you have
> less than 3 AZ, then you are in troubles...
>
> About the unreachable errors, I believe it can be due to the overload due
> to the missing nodes. Pressure on the remaining node might be too strong.
>
> However, As soon as I started removing nodes one by one, every time time
>> we see lot of timeout and unavailable exceptions which doesn’t make any
>> sense because I am just removing a node that doesn’t even exist.
>>
>
> This probably added even more load, if you are using vnodes, all the
> remaining nodes probably started streaming data to each other node at the
> speed of "nodetool getstreamthroughput". AWS network isn't that good, and
> is probably saturated. Also have you the phi_convict_threshold configured
> to a high value at least 10 or 12 ? This would avoid nodes to be marked
> down that often.
>
> What does "nodetool tpstats" outputs ?
>
> Also you might try to monitor resources and see what happens (my guess is
> you should focus at iowait, disk usage and network, have an eye at cpu too).
>
> A quick fix would probably be to hardly throttle the network on all the
> nodes and see if it helps:
>
> nodetool setstreamthroughput 2
>
> If this work, you could incrementally increase it and monitor, find the
> good tuning and put it the cassandra.yaml.
>
> I opened a ticket a while ago about that issue:
> https://issues.apache.org/jira/browse/CASSANDRA-9509
>
> I hope this will help you to go back to a healthy state allowing you a
> fast upgrade ;-).
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-03-02 22:17 GMT+01:00 Peddi, Praveen :
>
>> Hi Robert,
>> Thanks for your response.
>>
>> Replication factor is 3.
>>
>> We are in the process of upgrading to 2.2.4. We have had too many
>> performance issues with later versions of Cassandra (I have asked asked for
>> help related to that in the forum). We are close to getting to similar
>> performance now and hopefully upgrade in next few weeks. Lot of testing to
>> do :(.
>>
>> We are not removing multiple nodes at the same time. All dead nodes are
>> from same AZ so there were no errors when the nodes were down as expected
>> (because we use QUORUM). However, As soon as I started removing nodes one
>> by one, every time time we see lot of timeout and unavailable exceptions
>> which doesn’t make any sense because I am just removing a node that doesn’t
>> even exist.
>>
>>
>>
>>
>>
>>
>> From: Robert Coli 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Wednesday, March 2, 2016 at 2:52 PM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: Removing Node causes bunch of HostUnavailableException
>>
>> On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen  wrote:
>>
>>> We have few dead nodes in the cluster (Amazon ASG removed those thinking
>>> there is an issue with health). Now we are trying to remove those dead
>>> nodes from the cluster so that other nodes can take over. As soon as I
>>> execute nodetool removenode , we see lots of HostUnavailableExceptions
>>> both on reads and writes. What I am not able to understand is, these are
>>> deadnodes and don’t even physically exists. Why would removenode command
>>> cause any outage of nodes in Cassandra when we had no errors whatsoever
>>> before removing them. I could not really find a jira ticket for this.
>>>
>>
>> What is your replication factor?
>>
>> Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP.
>>
>> Also, removing multiple nodes with removenode means your consistency is
>> pretty hosed. Repair ASAP, but there are potential cases where repair won't
>> help.
>>
>> =Rob
>>
>>
>> =Rob
>>
>>
>
>


Re: Removing Node causes bunch of HostUnavailableException

2016-03-02 Thread Alain RODRIGUEZ
Hi Praveen,


> We are not removing multiple nodes at the same time. All dead nodes are
> from same AZ so there were no errors when the nodes were down as expected
> (because we use QUORUM)


Do you use at leat 3 distinct AZ ? If so, you should indeed be fine
regarding data integrity. Also repair should then work for you. If you have
less than 3 AZ, then you are in troubles...

About the unreachable errors, I believe it can be due to the overload due
to the missing nodes. Pressure on the remaining node might be too strong.

However, As soon as I started removing nodes one by one, every time time we
> see lot of timeout and unavailable exceptions which doesn’t make any sense
> because I am just removing a node that doesn’t even exist.
>

This probably added even more load, if you are using vnodes, all the
remaining nodes probably started streaming data to each other node at the
speed of "nodetool getstreamthroughput". AWS network isn't that good, and
is probably saturated. Also have you the phi_convict_threshold configured
to a high value at least 10 or 12 ? This would avoid nodes to be marked
down that often.

What does "nodetool tpstats" outputs ?

Also you might try to monitor resources and see what happens (my guess is
you should focus at iowait, disk usage and network, have an eye at cpu too).

A quick fix would probably be to hardly throttle the network on all the
nodes and see if it helps:

nodetool setstreamthroughput 2

If this work, you could incrementally increase it and monitor, find the
good tuning and put it the cassandra.yaml.

I opened a ticket a while ago about that issue:
https://issues.apache.org/jira/browse/CASSANDRA-9509

I hope this will help you to go back to a healthy state allowing you a fast
upgrade ;-).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-02 22:17 GMT+01:00 Peddi, Praveen :

> Hi Robert,
> Thanks for your response.
>
> Replication factor is 3.
>
> We are in the process of upgrading to 2.2.4. We have had too many
> performance issues with later versions of Cassandra (I have asked asked for
> help related to that in the forum). We are close to getting to similar
> performance now and hopefully upgrade in next few weeks. Lot of testing to
> do :(.
>
> We are not removing multiple nodes at the same time. All dead nodes are
> from same AZ so there were no errors when the nodes were down as expected
> (because we use QUORUM). However, As soon as I started removing nodes one
> by one, every time time we see lot of timeout and unavailable exceptions
> which doesn’t make any sense because I am just removing a node that doesn’t
> even exist.
>
>
>
>
>
>
> From: Robert Coli 
> Reply-To: "user@cassandra.apache.org" 
> Date: Wednesday, March 2, 2016 at 2:52 PM
> To: "user@cassandra.apache.org" 
> Subject: Re: Removing Node causes bunch of HostUnavailableException
>
> On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen  wrote:
>
>> We have few dead nodes in the cluster (Amazon ASG removed those thinking
>> there is an issue with health). Now we are trying to remove those dead
>> nodes from the cluster so that other nodes can take over. As soon as I
>> execute nodetool removenode , we see lots of HostUnavailableExceptions
>> both on reads and writes. What I am not able to understand is, these are
>> deadnodes and don’t even physically exists. Why would removenode command
>> cause any outage of nodes in Cassandra when we had no errors whatsoever
>> before removing them. I could not really find a jira ticket for this.
>>
>
> What is your replication factor?
>
> Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP.
>
> Also, removing multiple nodes with removenode means your consistency is
> pretty hosed. Repair ASAP, but there are potential cases where repair won't
> help.
>
> =Rob
>
>
> =Rob
>
>


Re: MemtableReclaimMemory pending building up

2016-03-02 Thread Dan Kinder
Also should note: Cassandra 2.2.5, Centos 6.7

On Wed, Mar 2, 2016 at 1:34 PM, Dan Kinder  wrote:

> Hi y'all,
>
> I am writing to a cluster fairly fast and seeing this odd behavior happen,
> seemingly to single nodes at a time. The node starts to take more and more
> memory (instance has 48GB memory on G1GC). tpstats shows that
> MemtableReclaimMemory Pending starts to grow first, then later
> MutationStage builds up as well. By then most of the memory is being
> consumed, GC is getting longer, node slows down and everything slows down
> unless I kill the node. Also the number of Active MemtableReclaimMemory
> threads seems to stay at 1. Also interestingly, neither CPU nor disk
> utilization are pegged while this is going on; it's on jbod and there is
> plenty of headroom there. (Note that there is a decent number of
> compactions going on as well but that is expected on these nodes and this
> particular one is catching up from a high volume of writes).
>
> Anyone have any theories on why this would be happening?
>
>
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked
>  All time blocked
> MutationStage   192715481  311327142 0
> 0
> ReadStage 7 09142871 0
> 0
> RequestResponseStage  1 0  690823199 0
> 0
> ReadRepairStage   0 02145627 0
> 0
> CounterMutationStage  0 0  0 0
> 0
> HintedHandoff 0 0144 0
> 0
> MiscStage 0 0  0 0
> 0
> CompactionExecutor   1224  41022 0
> 0
> MemtableReclaimMemory 1   102   4263 0
> 0
> PendingRangeCalculator0 0 10 0
> 0
> GossipStage   0 0 148329 0
> 0
> MigrationStage0 0  0 0
> 0
> MemtablePostFlush 0 0   5233 0
> 0
> ValidationExecutor0 0  0 0
> 0
> Sampler   0 0  0 0
> 0
> MemtableFlushWriter   0 0   4270 0
> 0
> InternalResponseStage 0 0   16322698 0
> 0
> AntiEntropyStage  0 0  0 0
> 0
> CacheCleanupExecutor  0 0  0 0
> 0
> Native-Transport-Requests25 0  547935519 0
>   2586907
>
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION287057
> COUNTER_MUTATION 0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR149
>
>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


MemtableReclaimMemory pending building up

2016-03-02 Thread Dan Kinder
Hi y'all,

I am writing to a cluster fairly fast and seeing this odd behavior happen,
seemingly to single nodes at a time. The node starts to take more and more
memory (instance has 48GB memory on G1GC). tpstats shows that
MemtableReclaimMemory Pending starts to grow first, then later
MutationStage builds up as well. By then most of the memory is being
consumed, GC is getting longer, node slows down and everything slows down
unless I kill the node. Also the number of Active MemtableReclaimMemory
threads seems to stay at 1. Also interestingly, neither CPU nor disk
utilization are pegged while this is going on; it's on jbod and there is
plenty of headroom there. (Note that there is a decent number of
compactions going on as well but that is expected on these nodes and this
particular one is catching up from a high volume of writes).

Anyone have any theories on why this would be happening?


$ nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage   192715481  311327142 0
0
ReadStage 7 09142871 0
0
RequestResponseStage  1 0  690823199 0
0
ReadRepairStage   0 02145627 0
0
CounterMutationStage  0 0  0 0
0
HintedHandoff 0 0144 0
0
MiscStage 0 0  0 0
0
CompactionExecutor   1224  41022 0
0
MemtableReclaimMemory 1   102   4263 0
0
PendingRangeCalculator0 0 10 0
0
GossipStage   0 0 148329 0
0
MigrationStage0 0  0 0
0
MemtablePostFlush 0 0   5233 0
0
ValidationExecutor0 0  0 0
0
Sampler   0 0  0 0
0
MemtableFlushWriter   0 0   4270 0
0
InternalResponseStage 0 0   16322698 0
0
AntiEntropyStage  0 0  0 0
0
CacheCleanupExecutor  0 0  0 0
0
Native-Transport-Requests25 0  547935519 0
  2586907

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION287057
COUNTER_MUTATION 0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR149


Re: Removing Node causes bunch of HostUnavailableException

2016-03-02 Thread Peddi, Praveen
Hi Robert,
Thanks for your response.

Replication factor is 3.

We are in the process of upgrading to 2.2.4. We have had too many performance 
issues with later versions of Cassandra (I have asked asked for help related to 
that in the forum). We are close to getting to similar performance now and 
hopefully upgrade in next few weeks. Lot of testing to do :(.

We are not removing multiple nodes at the same time. All dead nodes are from 
same AZ so there were no errors when the nodes were down as expected (because 
we use QUORUM). However, As soon as I started removing nodes one by one, every 
time time we see lot of timeout and unavailable exceptions which doesn't make 
any sense because I am just removing a node that doesn't even exist.








From: Robert Coli >
Reply-To: "user@cassandra.apache.org" 
>
Date: Wednesday, March 2, 2016 at 2:52 PM
To: "user@cassandra.apache.org" 
>
Subject: Re: Removing Node causes bunch of HostUnavailableException

On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen 
> wrote:
We have few dead nodes in the cluster (Amazon ASG removed those thinking there 
is an issue with health). Now we are trying to remove those dead nodes from the 
cluster so that other nodes can take over. As soon as I execute nodetool 
removenode , we see lots of HostUnavailableExceptions both on reads and 
writes. What I am not able to understand is, these are deadnodes and don't even 
physically exists. Why would removenode command cause any outage of nodes in 
Cassandra when we had no errors whatsoever before removing them. I could not 
really find a jira ticket for this.

What is your replication factor?

Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP.

Also, removing multiple nodes with removenode means your consistency is pretty 
hosed. Repair ASAP, but there are potential cases where repair won't help.

=Rob


=Rob



RE: Lot of GC on two nodes out of 7

2016-03-02 Thread Amit Singh F
Hi Anishek,

We too faced similar problem in 2.0.14 and after doing some research we config 
few parameters in Cassandra.yaml and was able to overcome GC pauses . Those are 
:


· memtable_flush_writers : increased from 1 to 3 as from tpstats output 
 we can see mutations dropped so it means writes are getting blocked, so 
increasing number will have those catered.

· memtable_total_space_in_mb : Default (1/4 of heap size), can lowered 
because larger long lived objects will create pressure on HEAP, so its better 
to reduce some amount of size.

· Concurrent_compactors : Alain righlty pointed out this i.e reduce it 
to 8. You need to try this.

Also please check whether you have mutations drop in other nodes or not.

Hope this helps in your cluster too.

Regards
Amit Singh
From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Wednesday, March 02, 2016 9:33 PM
To: user@cassandra.apache.org
Subject: Re: Lot of GC on two nodes out of 7

Can you post a gist of the output of jstat -gccause (60 seconds worth)?  I 
think it's cool you're willing to experiment with alternative JVM settings but 
I've never seen anyone use max tenuring threshold of 50 either and I can't 
imagine it's helpful.  Keep in mind if your objects are actually reaching that 
threshold it means they've been copied 50x (really really slow) and also you're 
going to end up spilling your eden objects directly into your old gen if your 
survivor is full.  Considering the small amount of memory you're using for heap 
I'm really not surprised you're running into problems.

I recommend G1GC + 12GB heap and just let it optimize itself for almost all 
cases with the latest JVM versions.

On Wed, Mar 2, 2016 at 6:08 AM Alain RODRIGUEZ 
> wrote:
It looks like you are doing a good work with this cluster and know a lot about 
JVM, that's good :-).

our machine configurations are : 2 X 800 GB SSD , 48 cores, 64 GB RAM

That's good hardware too.

With 64 GB of ram I would probably directly give a try to `MAX_HEAP_SIZE=8G` on 
one of the 2 bad nodes probably.

Also I would also probably try lowering `HEAP_NEWSIZE=2G.` and using 
`-XX:MaxTenuringThreshold=15`, still on the canary node to observe the effects. 
But that's just an idea of something I would try to see the impacts, I don't 
think it will solve your current issues or even make it worse for this node.

Using G1GC would allow you to use a bigger Heap size. Using C*2.1 would allow 
you to store the memtables off-heap. Those are 2 improvements reducing the heap 
pressure that you might be interested in.

I have spent time reading about all other options before including them and a 
similar configuration on our other prod cluster is showing good GC graphs via 
gcviewer.

So, let's look for an other reason.

there are MUTATION and READ messages dropped in high number on nodes in 
question and on other 5 nodes it varies between 1-3.

- Is Memory, CPU or disk a bottleneck? Is one of those running at the limits?

concurrent_compactors: 48

Reducing this to 8 would free some space for transactions (R requests). It is 
probably worth a try, even more when compaction is not keeping up and 
compaction throughput is not throttled.

Just found an issue about that: 
https://issues.apache.org/jira/browse/CASSANDRA-7139

Looks like `concurrent_compactors: 8` is the new default.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com






2016-03-02 12:27 GMT+01:00 Anishek Agarwal 
>:
Thanks a lot Alian for the details.
`HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
`MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You might 
want to keep this as it or even reduce it if you have less than 16 GB of native 
memory. Go with 8 GB if you have a lot of memory.
`-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so far. I 
had luck with values between 4 <--> 16 in the past. I would give  a try with 15.
`-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ? Using 
default and then tune from there to improve things is generally a good idea.


we have a lot of reads and writes onto the system so keeping the high new size 
to make sure enough is held in memory including caches / memtables etc --number 
of flush_writers : 4 for us. similarly keeping less in old generation to make 
sure we spend less time with CMS GC most of the data is transient in memory for 
us. Keeping high TenuringThreshold because we don't want objects going to old 
generation and just die in young generation given we have configured large 
survivor spaces.
using occupancyFraction as 70 since
given heap is 4G
survivor space is : 400 mb -- 2 survivor spaces
70 % of 2G (old generation) = 1.4G

so once we are just below 1.4G and we have to move 

Re: Broken links in Apache Cassandra home page

2016-03-02 Thread Robert Coli
On Wed, Mar 2, 2016 at 7:00 AM, Eric Evans  wrote:

> On Tue, Mar 1, 2016 at 8:30 PM, ANG ANG  wrote:
> > "#cassandra channel": http://freenode.net/
>
> The latter, while not presently useful, links to a "coming soon..."
> for Freenode.  It might be pedantic to insist it's not broken, but I
> don't know where else we could point that.  Freenode *is* the network
> hosting the IRC channels, and such that it is, that is their website
> (for now).
>

https://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt

irc://freenode.net/#cassandra

=Rob


Re: Snitch for AWS EC2 nondefaultVPC

2016-03-02 Thread Robert Coli
On Wed, Mar 2, 2016 at 7:21 AM, Arun Sandu  wrote:
>
> All the nodes in both datacenters are in DSE Search Mode(Solr). We may
> have analytics datacenter as well in future. Will this have any impact in
> using Ec2MultiRegionSnitch?
>

This list does not support DSE, but as I understand it, they create a faux
DC for use for analytics.

EC2MRSnitch forces your DC to be what amazon says for host region. It
stands to reason that if it's forced to host region, it can't be set to the
value of "faux DC for analytics".

=Rob


Re: Removing Node causes bunch of HostUnavailableException

2016-03-02 Thread Robert Coli
On Wed, Mar 2, 2016 at 8:10 AM, Peddi, Praveen  wrote:

> We have few dead nodes in the cluster (Amazon ASG removed those thinking
> there is an issue with health). Now we are trying to remove those dead
> nodes from the cluster so that other nodes can take over. As soon as I
> execute nodetool removenode , we see lots of HostUnavailableExceptions
> both on reads and writes. What I am not able to understand is, these are
> deadnodes and don’t even physically exists. Why would removenode command
> cause any outage of nodes in Cassandra when we had no errors whatsoever
> before removing them. I could not really find a jira ticket for this.
>

What is your replication factor?

Also, 2.0.9 is meaningfully old at this point, consider upgrading ASAP.

Also, removing multiple nodes with removenode means your consistency is
pretty hosed. Repair ASAP, but there are potential cases where repair won't
help.

=Rob


=Rob


RE: Lot of GC on two nodes out of 7

2016-03-02 Thread Amit Singh F
Hi Anishek,

We too faced similar problem in 2.0.14 and after doing some research we config 
few parameters in Cassandra.yaml and was able to overcome GC pauses . Those are 
:


· memtable_flush_writers : increased from 1 to 3 as from tpstats output 
 we can see mutations dropped so it means writes are getting blocked, so 
increasing number will have those catered.

· memtable_total_space_in_mb : Default (1/4 of heap size), can lowered 
because larger long lived objects will create pressure on HEAP, so its better 
to reduce some amount of size.

· Concurrent_compactors : Alain righlty pointed out this i.e reduce it 
to 8. You need to try this.

Also please check whether you have mutations drop in other nodes or not.

Hope this helps in your cluster too.

Regards
Amit Singh

From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: Wednesday, March 02, 2016 9:33 PM
To: user@cassandra.apache.org
Subject: Re: Lot of GC on two nodes out of 7

Can you post a gist of the output of jstat -gccause (60 seconds worth)?  I 
think it's cool you're willing to experiment with alternative JVM settings but 
I've never seen anyone use max tenuring threshold of 50 either and I can't 
imagine it's helpful.  Keep in mind if your objects are actually reaching that 
threshold it means they've been copied 50x (really really slow) and also you're 
going to end up spilling your eden objects directly into your old gen if your 
survivor is full.  Considering the small amount of memory you're using for heap 
I'm really not surprised you're running into problems.

I recommend G1GC + 12GB heap and just let it optimize itself for almost all 
cases with the latest JVM versions.

On Wed, Mar 2, 2016 at 6:08 AM Alain RODRIGUEZ 
> wrote:
It looks like you are doing a good work with this cluster and know a lot about 
JVM, that's good :-).

our machine configurations are : 2 X 800 GB SSD , 48 cores, 64 GB RAM

That's good hardware too.

With 64 GB of ram I would probably directly give a try to `MAX_HEAP_SIZE=8G` on 
one of the 2 bad nodes probably.

Also I would also probably try lowering `HEAP_NEWSIZE=2G.` and using 
`-XX:MaxTenuringThreshold=15`, still on the canary node to observe the effects. 
But that's just an idea of something I would try to see the impacts, I don't 
think it will solve your current issues or even make it worse for this node.

Using G1GC would allow you to use a bigger Heap size. Using C*2.1 would allow 
you to store the memtables off-heap. Those are 2 improvements reducing the heap 
pressure that you might be interested in.

I have spent time reading about all other options before including them and a 
similar configuration on our other prod cluster is showing good GC graphs via 
gcviewer.

So, let's look for an other reason.

there are MUTATION and READ messages dropped in high number on nodes in 
question and on other 5 nodes it varies between 1-3.

- Is Memory, CPU or disk a bottleneck? Is one of those running at the limits?

concurrent_compactors: 48

Reducing this to 8 would free some space for transactions (R requests). It is 
probably worth a try, even more when compaction is not keeping up and 
compaction throughput is not throttled.

Just found an issue about that: 
https://issues.apache.org/jira/browse/CASSANDRA-7139

Looks like `concurrent_compactors: 8` is the new default.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com






2016-03-02 12:27 GMT+01:00 Anishek Agarwal 
>:
Thanks a lot Alian for the details.
`HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
`MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You might 
want to keep this as it or even reduce it if you have less than 16 GB of native 
memory. Go with 8 GB if you have a lot of memory.
`-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so far. I 
had luck with values between 4 <--> 16 in the past. I would give  a try with 15.
`-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ? Using 
default and then tune from there to improve things is generally a good idea.


we have a lot of reads and writes onto the system so keeping the high new size 
to make sure enough is held in memory including caches / memtables etc --number 
of flush_writers : 4 for us. similarly keeping less in old generation to make 
sure we spend less time with CMS GC most of the data is transient in memory for 
us. Keeping high TenuringThreshold because we don't want objects going to old 
generation and just die in young generation given we have configured large 
survivor spaces.
using occupancyFraction as 70 since
given heap is 4G
survivor space is : 400 mb -- 2 survivor spaces
70 % of 2G (old generation) = 1.4G

so once we are 

Re: Snitch for AWS EC2 nondefaultVPC

2016-03-02 Thread Arun Sandu
Thanks Robert,

All the nodes in both datacenters are in DSE Search Mode(Solr). We may have
analytics datacenter as well in future. Will this have any impact in using
Ec2MultiRegionSnitch?

On Tue, Mar 1, 2016 at 7:10 PM, Robert Coli  wrote:

> On Tue, Mar 1, 2016 at 12:12 PM, Arun Sandu 
> wrote:
>
>> All our nodes are launched in AWS EC2 VPC (private). We have 2
>> datacenters(1 us-east , 1- asiapacific) and all communication is through
>> private IP's and don't have any public IPs. What is the recommended snitch
>> to be used? We currently have GossipingPropertyFileSnitch.
>>
>
> I recommend using GPFS unless you're absolutely certain you will never
> want to rely on any host but Amazon, and you will never want to (for
> example) have an analytics pseudo-datacenter within AWS.
>
> =Rob
>
>


-- 
Arun


Re: Snitch for AWS EC2 nondefaultVPC

2016-03-02 Thread Arun Sandu
Thanks Asher.

Yes, I agree. It would be better if someone can help us with clear
documentation about this.

As the cross data communication is through private IP, I would consider
updating braodcast_address to private IP and use Ec2MultiRegionSnitch.

On Tue, Mar 1, 2016 at 3:26 PM, Asher Newcomer  wrote:

> Hi Arun,
>
> This distinction has been a can of worms for me also - and I'm not sure my
> understanding is entirely correct.
>
> I use GossipingPropertyFileSnitch for my multi-region setup, which seems
> to be more flexible than the Ec2 snitches. The Ec2 snitches should work
> also, but their behavior is more opaque from my perspective.
>
> AFAIK - if all of your nodes can reach each other via private IP, and your
> anticipated clients can reach all nodes via their private IP, then using
> the private IP address as the broadcast_address is fine.
>
> If there will ever be a situation where a client or node will need to
> reach some part of the cluster using public IPs, then public IPs should be
> used as the broadcast_address.
>
> A simple flow-chart / diagram of how these various settings are used by
> Cassandra would be very helpful for people new to the project.
>
> Regards,
>
> Asher
>
> On Tue, Mar 1, 2016 at 3:12 PM, Arun Sandu  wrote:
>
>> Hi all,
>>
>> All our nodes are launched in AWS EC2 VPC (private). We have 2
>> datacenters(1 us-east , 1- asiapacific) and all communication is through
>> private IP's and don't have any public IPs. What is the recommended snitch
>> to be used? We currently have GossipingPropertyFileSnitch.
>>
>> 1. If Ec2MultiRegionSnitch, then what would be the broadcast_address?
>> 2. If not Ec2MultiRegionSnitch, which snitch better fits this environment?
>>
>> *Ref:*
>> As per the document Ec2MultiRegionSnitch
>> ,set
>> the listen_address
>> 
>> to the *private* IP address of the node, and the broadcast_address
>> 
>> to the *public* IP address of the node.
>>
>> --
>> Thanks
>> Arun
>>
>
>


-- 
Thanks
Arun


Re: Broken links in Apache Cassandra home page

2016-03-02 Thread Eric Evans
On Tue, Mar 1, 2016 at 8:30 PM, ANG ANG  wrote:
> Reference:
> http://stackoverflow.com/questions/35712166/broken-links-in-apache-cassandra-home-page/35724686#35724686
>
> The following links are broken in the Apache Cassandra Home/Welcome page:
>
> "materialized views":
> http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
> "#cassandra channel": http://freenode.net/

The former seems to have been fixed by tjake (see his SO response).
The latter, while not presently useful, links to a "coming soon..."
for Freenode.  It might be pedantic to insist it's not broken, but I
don't know where else we could point that.  Freenode *is* the network
hosting the IRC channels, and such that it is, that is their website
(for now).

> Is this the right forum to notify the community about this type of issues
> (e.g., outdated documentation, broken links)?

There is a "Documentation and Website" component in Jira
(https://issues.apache.org/jira/browse/CASSANDRA) you could use if you
like.  Raising the issue here is probably enough though.

Thanks for the report,

-- 
Eric Evans
eev...@wikimedia.org


Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Alain RODRIGUEZ
It looks like you are doing a good work with this cluster and know a lot
about JVM, that's good :-).

our machine configurations are : 2 X 800 GB SSD , 48 cores, 64 GB RAM


That's good hardware too.

With 64 GB of ram I would probably directly give a try to
`MAX_HEAP_SIZE=8G` on one of the 2 bad nodes probably.

Also I would also probably try lowering `HEAP_NEWSIZE=2G.` and using
`-XX:MaxTenuringThreshold=15`, still on the canary node to observe the
effects. But that's just an idea of something I would try to see the
impacts, I don't think it will solve your current issues or even make it
worse for this node.

Using G1GC would allow you to use a bigger Heap size. Using C*2.1 would
allow you to store the memtables off-heap. Those are 2 improvements
reducing the heap pressure that you might be interested in.

I have spent time reading about all other options before including them and
> a similar configuration on our other prod cluster is showing good GC graphs
> via gcviewer.


So, let's look for an other reason.

there are MUTATION and READ messages dropped in high number on nodes in
> question and on other 5 nodes it varies between 1-3.


- Is Memory, CPU or disk a bottleneck? Is one of those running at the
limits?

concurrent_compactors: 48


Reducing this to 8 would free some space for transactions (R requests).
It is probably worth a try, even more when compaction is not keeping up and
compaction throughput is not throttled.

Just found an issue about that:
https://issues.apache.org/jira/browse/CASSANDRA-7139

Looks like `concurrent_compactors: 8` is the new default.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com






2016-03-02 12:27 GMT+01:00 Anishek Agarwal :

> Thanks a lot Alian for the details.
>
>> `HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
>> `MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You
>> might want to keep this as it or even reduce it if you have less than 16 GB
>> of native memory. Go with 8 GB if you have a lot of memory.
>> `-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so
>> far. I had luck with values between 4 <--> 16 in the past. I would give  a
>> try with 15.
>> `-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ?
>> Using default and then tune from there to improve things is generally a
>> good idea.
>
>
>
> we have a lot of reads and writes onto the system so keeping the high new
> size to make sure enough is held in memory including caches / memtables etc
> --number of flush_writers : 4 for us. similarly keeping less in old
> generation to make sure we spend less time with CMS GC most of the data is
> transient in memory for us. Keeping high TenuringThreshold because we don't
> want objects going to old generation and just die in young generation given
> we have configured large survivor spaces.
> using occupancyFraction as 70 since
> given heap is 4G
> survivor space is : 400 mb -- 2 survivor spaces
> 70 % of 2G (old generation) = 1.4G
>
> so once we are just below 1.4G and we have to move the full survivor +
> some extra during a par new gc due to promotion failure, everything will
> fit in old generation, and will trigger CMS.
>
> I have spent time reading about all other options before including them
> and a similar configuration on our other prod cluster is showing good GC
> graphs via gcviewer.
>
> tp stats on all machines show flush writer blocked at : 0.3% of total
>
> the two nodes in question have stats almost as below
>
>- specifically there are pending was in readStage, MutationStage and
>RequestResponseStage
>
> Pool NameActive   Pending  Completed   Blocked
> All time blocked
>
> ReadStage2119 2141798645 0
> 0
>
> RequestResponseStage  0 1  803242391 0
> 0
>
> MutationStage 0 0  291813703 0
> 0
>
> ReadRepairStage   0 0  200544344 0
> 0
>
> ReplicateOnWriteStage 0 0  0 0
> 0
>
> GossipStage   0 0 292477 0
> 0
>
> CacheCleanupExecutor  0 0  0 0
> 0
>
> MigrationStage0 0  0 0
> 0
>
> MemoryMeter   0 0   2172 0
> 0
>
> FlushWriter   0 0   2756 0
> 6
>
> ValidationExecutor0 0101 0
> 0
>
> InternalResponseStage 0 0  0 0
> 0
>
> AntiEntropyStage  0 0202 

Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Anishek Agarwal
Thanks a lot Alian for the details.

> `HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
> `MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You
> might want to keep this as it or even reduce it if you have less than 16 GB
> of native memory. Go with 8 GB if you have a lot of memory.
> `-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so
> far. I had luck with values between 4 <--> 16 in the past. I would give  a
> try with 15.
> `-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ?
> Using default and then tune from there to improve things is generally a
> good idea.



we have a lot of reads and writes onto the system so keeping the high new
size to make sure enough is held in memory including caches / memtables etc
--number of flush_writers : 4 for us. similarly keeping less in old
generation to make sure we spend less time with CMS GC most of the data is
transient in memory for us. Keeping high TenuringThreshold because we don't
want objects going to old generation and just die in young generation given
we have configured large survivor spaces.
using occupancyFraction as 70 since
given heap is 4G
survivor space is : 400 mb -- 2 survivor spaces
70 % of 2G (old generation) = 1.4G

so once we are just below 1.4G and we have to move the full survivor + some
extra during a par new gc due to promotion failure, everything will fit in
old generation, and will trigger CMS.

I have spent time reading about all other options before including them and
a similar configuration on our other prod cluster is showing good GC graphs
via gcviewer.

tp stats on all machines show flush writer blocked at : 0.3% of total

the two nodes in question have stats almost as below

   - specifically there are pending was in readStage, MutationStage and
   RequestResponseStage

Pool NameActive   Pending  Completed   Blocked  All
time blocked

ReadStage2119 2141798645 0
0

RequestResponseStage  0 1  803242391 0
0

MutationStage 0 0  291813703 0
0

ReadRepairStage   0 0  200544344 0
0

ReplicateOnWriteStage 0 0  0 0
0

GossipStage   0 0 292477 0
0

CacheCleanupExecutor  0 0  0 0
0

MigrationStage0 0  0 0
0

MemoryMeter   0 0   2172 0
0

FlushWriter   0 0   2756 0
6

ValidationExecutor0 0101 0
0

InternalResponseStage 0 0  0 0
0

AntiEntropyStage  0 0202 0
0

MemtablePostFlusher   0 0   4395 0
0

MiscStage 0 0  0 0
0

PendingRangeCalculator0 0 20 0
0

CompactionExecutor4 4  49323 0
0

commitlog_archiver0 0  0 0
0

HintedHandoff 0 0116 0
0


Message type   Dropped

RANGE_SLICE  0

READ_REPAIR 36

PAGED_RANGE  0

BINARY   0

READ 11471

MUTATION   898

_TRACE   0

REQUEST_RESPONSE 0

COUNTER_MUTATION 0

all the other 5 nodes show no pending numbers.


our machine configurations are : 2 X 800 GB SSD , 48 cores, 64 GB RAM
compaction throughput is 0 MB/s
concurrent_compactors: 48
flush_writers: 4


> I think Jeff is trying to spot a wide row messing with your system, so
> looking at the max row size on those nodes compared to other is more
> relevant than average size for this check.


i think is what you are looking for, please correct me if i am wrong

Compacted partition maximum bytes: 1629722
similar value on all 7 nodes.

grep -i "ERROR" /var/log/cassandra/system.log


there are MUTATION and READ messages dropped in high number on nodes in
question and on other 5 nodes it varies between 1-3.

On Wed, Mar 2, 2016 at 4:15 PM, Alain RODRIGUEZ  wrote:

> Hi Anishek,
>
> Even if it highly depends on your workload, here are my thoughts:
>
> `HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
> `MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You
> might want to keep this as it or even reduce it if you have less than 16 GB
> of native memory. 

Re: Corrupt SSTables

2016-03-02 Thread Alain RODRIGUEZ
Hi Fred,

Corrupted data due to software are quite rare nowadays. I would have a look
at the filesystem to first see if everything is ok. I recently had a case
where FS was unmounted and mounted back in a read only mode, Cassandra did
not like it.


   1. You should indeed give a try to:

   nodetool scrub system compaction_history

   2. If this is not working you can bring the node down and run the
   offline scrub


   
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSSTableScrub_t.html

   3. "If running scrub on those sstables doesn't help would it be safe to
   delete those SSTables?"
   I am not sure about this one. I would probably go safely by dropping the
   node, cleaning it and bringing it back (using replace_address -->
   
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html).
   Or even remove the server if using a cloud, don't let instances bothering
   you if you're in the cloud, really...
   Yet I think this drop on sstable_activity + compaction_history is safe.
   I am just not sure about it.

>From my understanding, the system tables are local (not replicated) which
> means that removing those sstables and then run repair wont help.


Correct, you will not be able to run repair on the system keyspace due to
this (it makes no sense).

Hope this will help. If you find out that it is safe (or not) to remove
these sstables, I'll be happy to learn about that.

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-03-02 11:10 GMT+01:00 Fredrik Al :

> Hi all.
>
> Having two FSReadErrors:
>
> FSReadError in
> ..\..\data\system\compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca\system-compaction_history-ka-329-CompressionInfo.db
>
> FSReadError in
> ..\..\data\system\sstable_activity-5a1ff267ace03f128563cfae6103c65e\system-sstable_activity-ka-475-CompressionInfo.db
>
> From my understanding, the system tables are local (not replicated) which
> means that removing those sstables and then run repair wont help.
>
> If running scrub on those sstables doesn't help would it be safe to delete
> those SSTables?
>
> /Fred
>
>


Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Alain RODRIGUEZ
Hi Anishek,

Even if it highly depends on your workload, here are my thoughts:

`HEAP_NEWSIZE=4G.` is probably far too high (try 1200M <-> 2G)
`MAX_HEAP_SIZE=6G` might be too low, how much memory is available (You
might want to keep this as it or even reduce it if you have less than 16 GB
of native memory. Go with 8 GB if you have a lot of memory.
`-XX:MaxTenuringThreshold=50` is the highest value I have seen in use so
far. I had luck with values between 4 <--> 16 in the past. I would give  a
try with 15.
`-XX:CMSInitiatingOccupancyFraction=70`--> Why not using default - 75 ?
Using default and then tune from there to improve things is generally a
good idea.

You also use a bunch of option I don't know about, if you are uncertain
about them, you could try a default conf without the options you added and
just the using the changes above from default
https://github.com/apache/cassandra/blob/cassandra-2.0/conf/cassandra-env.sh.
Or you might find more useful information on a nice reference about this
topic which is Al Tobey's blog post about tuning 2.1. Go to the 'Java
Virtual Machine' part:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

FWIW, I also saw improvement in the past by upgrading to 2.1, Java 8 and
G1GC. G1GC is supposed to be easier to configure too.

the average row size for compacted partitions is about 1640 bytes on all
> nodes. We have replication factor 3 but the problem is only on two nodes.
>

I think Jeff is trying to spot a wide row messing with your system, so
looking at the max row size on those nodes compared to other is more
relevant than average size for this check.

the only other thing that stands out in cfstats is the read time and write
> time on the nodes with high GC is 5-7 times higher than other 5 nodes, but
> i think thats expected.


I would probably look at this the reverse way: I imagine that extra GC  is
a consequence of something going wrong on those nodes as JVM / GC are
configured the same way cluster-wide. GC / JVM issues are often due to
Cassandra / system / hardware issues, inducing extra pressure on the JVM. I
would try to tune JVM / GC only once the system is healthy. So I often saw
high GC being a consequence rather than the root cause of an issue.

To explore this possibility:

Does this command show some dropped or blocked tasks? This would add
pressure to heap.
nodetool tpstats

Do you have errors in logs? Always good to know when facing an issue.
grep -i "ERROR" /var/log/cassandra/system.log

How are compactions tuned (throughput + concurrent compactors)? This tuning
might explain compactions not keeping up or a high GC pressure.

What are your disks / CPU? To help us giving you good arbitrary values to
try.

Is there some iowait ? Could point to a bottleneck or bad hardware.
iostats -mx 5 100

...

Hope one of those will point you to an issue, but there are many more thing
you could check.

Let us know how it goes,

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-03-02 10:33 GMT+01:00 Anishek Agarwal :

> also MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=4G.
>
> On Wed, Mar 2, 2016 at 1:40 PM, Anishek Agarwal  wrote:
>
>> Hey Jeff,
>>
>> one of the nodes with high GC has 1400 SST tables, all other nodes have
>> about 500-900 SST tables. the other node with high GC has 636 SST tables.
>>
>> the average row size for compacted partitions is about 1640 bytes on all
>> nodes. We have replication factor 3 but the problem is only on two nodes.
>> the only other thing that stands out in cfstats is the read time and
>> write time on the nodes with high GC is 5-7 times higher than other 5
>> nodes, but i think thats expected.
>>
>> thanks
>> anishek
>>
>>
>>
>>
>> On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa 
>> wrote:
>>
>>> Compaction falling behind will likely cause additional work on reads
>>> (more sstables to merge), but I’d be surprised if it manifested in super
>>> long GC. When you say twice as many sstables, how many is that?.
>>>
>>> In cfstats, does anything stand out? Is max row size on those nodes
>>> larger than on other nodes?
>>>
>>> What you don’t show in your JVM options is the new gen size – if you do
>>> have unusually large partitions on those two nodes (especially likely if
>>> you have rf=2 – if you have rf=3, then there’s probably a third node
>>> misbehaving you haven’t found yet), then raising new gen size can help
>>> handle the garbage created by reading large partitions without having to
>>> tolerate the promotion. Estimates for the amount of garbage vary, but it
>>> could be “gigabytes” of garbage on a very wide partition (see
>>> https://issues.apache.org/jira/browse/CASSANDRA-9754 for work in
>>> progress to help mitigate that type of pain).
>>>
>>> - Jeff
>>>
>>> From: Anishek Agarwal
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Tuesday, 

Corrupt SSTables

2016-03-02 Thread Fredrik Al
Hi all.

Having two FSReadErrors:

FSReadError in
..\..\data\system\compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca\system-compaction_history-ka-329-CompressionInfo.db

FSReadError in
..\..\data\system\sstable_activity-5a1ff267ace03f128563cfae6103c65e\system-sstable_activity-ka-475-CompressionInfo.db

>From my understanding, the system tables are local (not replicated) which
means that removing those sstables and then run repair wont help.

If running scrub on those sstables doesn't help would it be safe to delete
those SSTables?

/Fred


Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Anishek Agarwal
also MAX_HEAP_SIZE=6G and HEAP_NEWSIZE=4G.

On Wed, Mar 2, 2016 at 1:40 PM, Anishek Agarwal  wrote:

> Hey Jeff,
>
> one of the nodes with high GC has 1400 SST tables, all other nodes have
> about 500-900 SST tables. the other node with high GC has 636 SST tables.
>
> the average row size for compacted partitions is about 1640 bytes on all
> nodes. We have replication factor 3 but the problem is only on two nodes.
> the only other thing that stands out in cfstats is the read time and write
> time on the nodes with high GC is 5-7 times higher than other 5 nodes, but
> i think thats expected.
>
> thanks
> anishek
>
>
>
>
> On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa 
> wrote:
>
>> Compaction falling behind will likely cause additional work on reads
>> (more sstables to merge), but I’d be surprised if it manifested in super
>> long GC. When you say twice as many sstables, how many is that?.
>>
>> In cfstats, does anything stand out? Is max row size on those nodes
>> larger than on other nodes?
>>
>> What you don’t show in your JVM options is the new gen size – if you do
>> have unusually large partitions on those two nodes (especially likely if
>> you have rf=2 – if you have rf=3, then there’s probably a third node
>> misbehaving you haven’t found yet), then raising new gen size can help
>> handle the garbage created by reading large partitions without having to
>> tolerate the promotion. Estimates for the amount of garbage vary, but it
>> could be “gigabytes” of garbage on a very wide partition (see
>> https://issues.apache.org/jira/browse/CASSANDRA-9754 for work in
>> progress to help mitigate that type of pain).
>>
>> - Jeff
>>
>> From: Anishek Agarwal
>> Reply-To: "user@cassandra.apache.org"
>> Date: Tuesday, March 1, 2016 at 11:12 PM
>> To: "user@cassandra.apache.org"
>> Subject: Lot of GC on two nodes out of 7
>>
>> Hello,
>>
>> we have a cassandra cluster of 7 nodes, all of them have the same JVM GC
>> configurations, all our writes /  reads use the TokenAware Policy wrapping
>> a DCAware policy. All nodes are part of same Datacenter.
>>
>> We are seeing that two nodes are having high GC collection times. Then
>> mostly seem to spend time in GC like about 300-600 ms. This also seems to
>> result in higher CPU utilisation on these machines. Other  5 nodes don't
>> have this problem.
>>
>> There is no additional repair activity going on the cluster, we are not
>> sure why this is happening.
>> we checked cfhistograms on the two CF we have in the cluster and number
>> of reads seems to be almost same.
>>
>> we also used cfstats to see the number of ssttables on each node and one
>> of the nodes with the above problem has twice the number of ssttables than
>> other nodes. This still doesnot explain why two nodes have high GC
>> Overheads. our GC config is as below:
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>>
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>>
>> JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>>
>> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"
>>
>> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"
>>
>> JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>>
>> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>>
>> # earlier value 131072 = 32768 * 4
>>
>> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768"
>>
>> JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768"
>>
>> #new
>>
>> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>>
>> We are using cassandra 2.0.17. If anyone has any suggestion as to how
>> what else we can look for to understand why this is happening please do
>> reply.
>>
>>
>>
>> Thanks
>> anishek
>>
>>
>>
>


Re: DATA replication from Oracle DB to Cassandra

2016-03-02 Thread Hannu Kröger
Hi,

I have implemented once one way replication from a RDBMS to Cassandra using 
triggers in the source database side. If you timestamp the changes from the 
source, it’s possible to timestamp them on the cassandra side as well and that 
takes care of a lot of ordering of the changes. Assuming that your data model 
doesn’t change too much.

In practise:
- Triggers push change events to a commit log and that is pushed to a queue
- Readers on Cassandra side reads to events from the queue and write them in 
cassandra with the timestamp from the change event
- Cassandra handles ordering of change events

Using timestamps you can resend changes, read in events in any order, etc. If 
you screw up the replication somehow (we did many times), it was easy to just 
create a dump on the source and load that in again with timestamps so that the 
system was running all the time.

This way it’s possible to achieve quite low latency (seconds, not minutes) for 
the replication.

Cheers,
Hannu

> On 02 Mar 2016, at 03:11, anil_ah  wrote:
> 
> Hi 
>I want to run spark job to do incremental sync from oracle to 
> cassandra,job interval could be one minute.we are looking for a real time 
> replication with latency of 1 or 2 min.
> 
> Please advise  what would be best Approch
> 
> 1)oracle db->spark sql ->spark->cassandra.
> 2)oracle db ->sqoop->cassandra 
> 
> Please advise which option is good in term of scalable,incremental etc
> 
> Regards 
> Anil
> 
> 
> 
> Sent from my Samsung device



Re: Lot of GC on two nodes out of 7

2016-03-02 Thread Anishek Agarwal
Hey Jeff,

one of the nodes with high GC has 1400 SST tables, all other nodes have
about 500-900 SST tables. the other node with high GC has 636 SST tables.

the average row size for compacted partitions is about 1640 bytes on all
nodes. We have replication factor 3 but the problem is only on two nodes.
the only other thing that stands out in cfstats is the read time and write
time on the nodes with high GC is 5-7 times higher than other 5 nodes, but
i think thats expected.

thanks
anishek




On Wed, Mar 2, 2016 at 1:09 PM, Jeff Jirsa 
wrote:

> Compaction falling behind will likely cause additional work on reads (more
> sstables to merge), but I’d be surprised if it manifested in super long GC.
> When you say twice as many sstables, how many is that?.
>
> In cfstats, does anything stand out? Is max row size on those nodes larger
> than on other nodes?
>
> What you don’t show in your JVM options is the new gen size – if you do
> have unusually large partitions on those two nodes (especially likely if
> you have rf=2 – if you have rf=3, then there’s probably a third node
> misbehaving you haven’t found yet), then raising new gen size can help
> handle the garbage created by reading large partitions without having to
> tolerate the promotion. Estimates for the amount of garbage vary, but it
> could be “gigabytes” of garbage on a very wide partition (see
> https://issues.apache.org/jira/browse/CASSANDRA-9754 for work in progress
> to help mitigate that type of pain).
>
> - Jeff
>
> From: Anishek Agarwal
> Reply-To: "user@cassandra.apache.org"
> Date: Tuesday, March 1, 2016 at 11:12 PM
> To: "user@cassandra.apache.org"
> Subject: Lot of GC on two nodes out of 7
>
> Hello,
>
> we have a cassandra cluster of 7 nodes, all of them have the same JVM GC
> configurations, all our writes /  reads use the TokenAware Policy wrapping
> a DCAware policy. All nodes are part of same Datacenter.
>
> We are seeing that two nodes are having high GC collection times. Then
> mostly seem to spend time in GC like about 300-600 ms. This also seems to
> result in higher CPU utilisation on these machines. Other  5 nodes don't
> have this problem.
>
> There is no additional repair activity going on the cluster, we are not
> sure why this is happening.
> we checked cfhistograms on the two CF we have in the cluster and number of
> reads seems to be almost same.
>
> we also used cfstats to see the number of ssttables on each node and one
> of the nodes with the above problem has twice the number of ssttables than
> other nodes. This still doesnot explain why two nodes have high GC
> Overheads. our GC config is as below:
>
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=50"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>
> JVM_OPTS="$JVM_OPTS -XX:MaxPermSize=256m"
>
> JVM_OPTS="$JVM_OPTS -XX:+AggressiveOpts"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>
> JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=48"
>
> JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=48"
>
> JVM_OPTS="$JVM_OPTS -XX:-ExplicitGCInvokesConcurrent"
>
> JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>
> JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>
> JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>
> # earlier value 131072 = 32768 * 4
>
> JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=131072"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSScheduleRemarkEdenSizeThreshold=104857600"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSRescanMultiple=32768"
>
> JVM_OPTS="$JVM_OPTS -XX:CMSConcMarkMultiple=32768"
>
> #new
>
> JVM_OPTS="$JVM_OPTS -XX:+CMSConcurrentMTEnabled"
>
> We are using cassandra 2.0.17. If anyone has any suggestion as to how what
> else we can look for to understand why this is happening please do reply.
>
>
>
> Thanks
> anishek
>
>
>